heli-xu / findSVI

Calculate CDC/ATSDR Social Vulnerability Index
https://heli-xu.github.io/findSVI/
Other
2 stars 3 forks source link

2017-2021 SVI for ZCTAs #12

Closed usamabilal closed 1 year ago

usamabilal commented 1 year ago

Thanks!

heli-xu commented 1 year ago

Hi Usama, thanks for your patience. I'm attaching a zip file with data, a report and some documentation. Please refer to Readme in the zip file for more detailed information. I'd appreciate any suggestions and feedback from you and your team. Thanks!

2017to2021_PA_zcta_SVI.zip

usamabilal commented 1 year ago

Thanks so much Heli!! i'll Review and will let you know how things go

usamabilal commented 1 year ago

After reviewing, this looks great. I really like the validation. I understand that part of the differences in the validation stem from potential differences in aggregation from CT to ZCTA. Let me know if my understanding below is correct:

A few notes (regardless of the 1 vs 2 thing above):

Thanks again!!

heli-xu commented 1 year ago

Hi Usama, thanks for the feedback!

In the section "Aggregating ct data to ZCTA level" i understood you are doing 1, but in the code for "Percentile ranking (“RPL_xx”) by theme" i see option 2. Which one is happening?

You're right about how hSVI works, including the part that I used two aggregating methods. I did the sum E_variables and mean EP_variables without further computing percentiles and SVI, and I also took the mean of the percentiles separately. The purpose was to look at not only the aggregated cSVI, but also the individual variables in terms of their correlation with our calculation results. So by your standard, I was using option2 for cSVI aggregation from CT to ZCTA, and additionally I was using (part of) option1 for variable aggregation from CT to ZCTA. I'd be happy to do option1 for cSVI aggregation too if you'd like.

the ZCTA vs CT validation, while nice, may be complicated to actually conduct properly.

I completely agree with you about how tricky ZCTA vs CT validation can be, and the point about the ZCTA-specific weights makes a lot of sense. I got quite frustrated while trying to do the aggregation, but wanted to include them and hear your thoughts.

It'd be good to replicate this at the county levle and compare hSVI with cSVI at the county level

Here is a new report where I added the comparison between hSVI and cSVI at the county level (2018, 2020) and census tract level (2020) .

Thanks again for your time and advice, and please let me know if you have other questions/suggestions.

usamabilal commented 1 year ago

Thank you! I know get it. so "method" 1 for comparing variables and "method" 2 for comparing the SVI itself. Part of the issue may be that an aggregation of percentiles may not be comparable with an aggregation of variables and then creating percentiles. This is known as the STA vs ATS dilemma: summarize (aggregate) then analyze (percentile calculation) = STA vs analyze (percentile calculation) then summarize (aggregate)=ATS. Your approach for validation of the SVI is ATS (you first calculate percentiles and then aggregate by taking the mean of percentiles)

County-level validation looks great. I think CT (usual acronym for tracts) and CTY (usual acronym for counties) validation is all you need to ensure you are doing the right things.

Now one last thing: I do observe a few very minor differences in both CT and CTY. What do you attribute them to?

heli-xu commented 1 year ago

Good to know. Thank you very much! Indeed a dilemma...

For the minor differences, I think they may be due to the number of decimal places in EP_variables (percentage). CDC version keeps one decimal place, whereas ours have more because I didn't specify it in the function (at the time I preferred to preserve as much information as possible). Here is a report with more details with some examples. I'd appreciate your insight, and we could adjust the function to make it more consistent with CDC's data if needed.

Thanks again for your help!

usamabilal commented 1 year ago

Great! It'd be great to try to "fully replicate" their approach by matching their number of decimals. Interesting that they don't include the caveat in the 2020 documentation...

heli-xu commented 1 year ago

Sounds good! This is a report where I used the updated function (with matching decimal places) to reproduce CDC SVI. Thanks again for your input!

heli-xu commented 1 year ago

If this looks good to you, I'll redo the zcta SVI (2017-2021, PA) using the new function and send them again.

usamabilal commented 1 year ago

Perfect!! Validation is 100% on point, so lets re-do them. thanks!

heli-xu commented 1 year ago

Sounds great! I'm attaching a zip folder with 5 updated tables of zcta-level SVI and a folder of CDC SVI tables and documentation for your reference (same as previously uploaded). I'd appreciate any further questions/suggestions. If they look good to you, please feel free to close the issue. Thanks again for your help with improving the result!

pa_zcta_svi_2017to2021_updated.zip

usamabilal commented 1 year ago

Thanks! All looks good,closing