policy-design-lab / data-import

Programs used to convert raw CSV files into JSON files for various PDL programs
0 stars 0 forks source link

Create json for EQIP IRA #99

Closed ywkim312 closed 3 months ago

ywkim312 commented 3 months ago

There two outputs one for summary and the other for state distribution. The result json "eqip_ira_state_distribution.json" and "eqip_ira_summary.json" are the ones that you want to check. For state distribution, the parser combines three tables. The main one will be the one that contains the data for 2023 and total. Then there are two additional table for future min and max values. I added these additional tables' future years' values to corresponding practice name of the 2023/total of each corresponding state in the output json. So some of the practices doesn't have any future data when they don't have any matching future min/max values. For summary, it only uses the first main table not the future min/max table

I am not very sure if the format is correct, so any suggestion will be welcomed.

pengyin-shan commented 3 months ago

Hi @ywkim312 , the changes are great 👍, thank you very much for your work!


Just a few very minor naming convention changes, if possible:

  1. For non-prediction name, can we change totalPayment to totalPaymentInDollars on the top level? For the sub-payment in practices, I noticed that we have 2023PaymentInDollars, which is equal to the totalPaymentInDollars. I suggest we remove the 2023 PaymentInDollars, as illustrated here: Screenshot 2024-05-22 at 12 00 46 PM

By doing this, we will achieve a data structure very similar to the existing EQIP/Title II data structure. This will also lead to unified naming for attributes with the same meaning, allowing us to reuse some front-end components.


  1. For the predictions, can we change MinimumDollars to minimumTotalPaymentInDollars, as well as MaximumDollars to maximumTotalPaymentInDollars? Again, this will make our naming convention unified: Screenshot 2024-05-22 at 12 08 05 PM

Lastly, just to confirm for predictions, we don't have a national level total minimum and total maximum number right? (as illustrated above)

ywkim312 commented 3 months ago

Hi @ywkim312 , the changes are great 👍, thank you very much for your work!

Just a few very minor naming convention changes, if possible:

  1. For non-prediction name, can we change totalPayment to totalPaymentInDollars on the top level? For the sub-payment in practices, I noticed that we have 2023PaymentInDollars, which is equal to the totalPaymentInDollars. I suggest we remove the 2023 PaymentInDollars, as illustrated here:
Screenshot 2024-05-22 at 12 00 46 PM

By doing this, we will achieve a data structure very similar to the existing EQIP/Title II data structure. This will also lead to unified naming for attributes with the same meaning, allowing us to reuse some front-end components.

  1. For the predictions, can we change MinimumDollars to minimumTotalPaymentInDollars, as well as MaximumDollars to maximumTotalPaymentInDollars? Again, this will make our naming convention unified:
Screenshot 2024-05-22 at 12 08 05 PM

Lastly, just to confirm for predictions, we don't have a national level total minimum and total maximum number right? (as illustrated above)

Formats has been changed.

No, we don't have a national level total, but I added the each states total in the output json

pengyin-shan commented 3 months ago

Hi @ywkim312 , the changes are great 👍, thank you very much for your work! Just a few very minor naming convention changes, if possible:

  1. For non-prediction name, can we change totalPayment to totalPaymentInDollars on the top level? For the sub-payment in practices, I noticed that we have 2023PaymentInDollars, which is equal to the totalPaymentInDollars. I suggest we remove the 2023 PaymentInDollars, as illustrated here:
Screenshot 2024-05-22 at 12 00 46 PM

By doing this, we will achieve a data structure very similar to the existing EQIP/Title II data structure. This will also lead to unified naming for attributes with the same meaning, allowing us to reuse some front-end components.

  1. For the predictions, can we change MinimumDollars to minimumTotalPaymentInDollars, as well as MaximumDollars to maximumTotalPaymentInDollars? Again, this will make our naming convention unified:
Screenshot 2024-05-22 at 12 08 05 PM

Lastly, just to confirm for predictions, we don't have a national level total minimum and total maximum number right? (as illustrated above)

Formats has been changed.

No, we don't have a national level total, but I added the each states total in the output json

Hi @ywkim312 ,thanks so much for these changes! It helps a lot to keep our code in a unified style and reusable in the future! 👍 just a extremely small thing: I think there's an extra 'Year' at the end of the "maximumTotalPaymentInDollarsYear" attribute:

Screenshot 2024-05-23 at 4 40 10 PM
ywkim312 commented 3 months ago

Hi @ywkim312 , the changes are great 👍, thank you very much for your work! Just a few very minor naming convention changes, if possible:

  1. For non-prediction name, can we change totalPayment to totalPaymentInDollars on the top level? For the sub-payment in practices, I noticed that we have 2023PaymentInDollars, which is equal to the totalPaymentInDollars. I suggest we remove the 2023 PaymentInDollars, as illustrated here:
Screenshot 2024-05-22 at 12 00 46 PM

By doing this, we will achieve a data structure very similar to the existing EQIP/Title II data structure. This will also lead to unified naming for attributes with the same meaning, allowing us to reuse some front-end components.

  1. For the predictions, can we change MinimumDollars to minimumTotalPaymentInDollars, as well as MaximumDollars to maximumTotalPaymentInDollars? Again, this will make our naming convention unified:
Screenshot 2024-05-22 at 12 08 05 PM

Lastly, just to confirm for predictions, we don't have a national level total minimum and total maximum number right? (as illustrated above)

Formats has been changed. No, we don't have a national level total, but I added the each states total in the output json

Hi @ywkim312 ,thanks so much for these changes! It helps a lot to keep our code in a unified style and reusable in the future! 👍 just a extremely small thing: I think there's an extra 'Year' at the end of the "maximumTotalPaymentInDollarsYear" attribute:

Screenshot 2024-05-23 at 4 40 10 PM

Oops, I am sorry. Fixed it

sandeep-ps commented 3 months ago

Hi, @ywkim312, I reviewed this PR by comparing the JSON data against Pivot tables generated from the raw CSV files.

I'm getting the year-wise total as different. For example, for CA, for 2024, minimumTotalPaymentInDollars comes to $18,226,729.12 in my calculation, but in the JSON file, it's $17,413,510.35. Could you please check this again? I'll share the pivot table via Slack.

A few other thoughts, but they may not be of high priority:

  1. It may be better to use predicted... prefix to differentiate between predicted vs actual data (current year)
  2. This is not a high priority either, but sorting the list of practices based on the code may be good too.
sandeep-ps commented 3 months ago

20231106-2024_2031-EQIPextrafund-project-by-practice-MIN-clean.xlsx

Or maybe it's better to add it here.

ywkim312 commented 3 months ago

Hi, @ywkim312, I reviewed this PR by comparing the JSON data against Pivot tables generated from the raw CSV files.

I'm getting the year-wise total as different. For example, for CA, for 2024, minimumTotalPaymentInDollars comes to $18,226,729.12 in my calculation, but in the JSON file, it's $17,413,510.35. Could you please check this again? I'll share the pivot table via Slack.

A few other thoughts, but they may not be of high priority:

  1. It may be better to use predicted... prefix to differentiate between predicted vs actual data (current year)
  2. This is not a high priority either, but sorting the list of practices based on the code may be good too

For 2024 total, not all the columns in the xls files are included. It only has the matching values with the practice number. Therefore, some columns will be missed because it can't matched to the ones in the 2023 table. So It kind of make sense that my number is lower than yours.

Also, can you make more detailed explain your suggestion number 1?

ywkim312 commented 3 months ago

20231106-2024_2031-EQIPextrafund-project-by-practice-MIN-clean.xlsx

Or maybe it's better to add it here.

Still not very clear what you mean. I am sorry to ask but can you make little more in detial?

sandeep-ps commented 3 months ago

Let me look at the total values again, and I will get back to you soon.

Regarding

It may be better to use predicted... prefix to differentiate between predicted vs actual data (current year)

I meant since minimumTotalPaymentInDollars and maximumTotalPaymentInDollars for years 2024-2031 are "predicted" payments, it may be better to say predictedMinimumTotalPaymentInDollars and predictedMaximumTotalPaymentInDollars.

ywkim312 commented 3 months ago

Let me look at the total values again, and I will get back to you soon.

Regarding

It may be better to use predicted... prefix to differentiate between predicted vs actual data (current year)

I meant since minimumTotalPaymentInDollars and maximumTotalPaymentInDollars for years 2024-2031 are "predicted" payments, it may be better to say predictedMinimumTotalPaymentInDollars and predictedMaximumTotalPaymentInDollars.

Oh I see. Thank you for the answer @pengyin-shan How do you think?

pengyin-shan commented 3 months ago

Let me look at the total values again, and I will get back to you soon. Regarding

It may be better to use predicted... prefix to differentiate between predicted vs actual data (current year)

I meant since minimumTotalPaymentInDollars and maximumTotalPaymentInDollars for years 2024-2031 are "predicted" payments, it may be better to say predictedMinimumTotalPaymentInDollars and predictedMaximumTotalPaymentInDollars.

Oh I see. Thank you for the answer @pengyin-shan How do you think?

Yes, this will work for me. Please feel free to add 'predicted' prefixes for all predicted values, and I'll refer to this prefix for all future 'predicted' data.

ywkim312 commented 3 months ago

Let me look at the total values again, and I will get back to you soon. Regarding

It may be better to use predicted... prefix to differentiate between predicted vs actual data (current year)

I meant since minimumTotalPaymentInDollars and maximumTotalPaymentInDollars for years 2024-2031 are "predicted" payments, it may be better to say predictedMinimumTotalPaymentInDollars and predictedMaximumTotalPaymentInDollars.

Oh I see. Thank you for the answer @pengyin-shan How do you think?

Yes, this will work for me. Please feel free to add 'predicted' prefixes for all predicted values, and I'll refer to this prefix for all future 'predicted' data.

@sandeep-ps @pengyin-shan I changed to predictedMinimum and predictedMaximum

sandeep-ps commented 3 months ago

Thanks, Yongwook! The changes look good to me. I will approve this PR.