Closed usamabilal closed 1 year ago
Hi Usama, thanks for your patience. I'm attaching a zip file with data, a report and some documentation. Please refer to Readme in the zip file for more detailed information. I'd appreciate any suggestions and feedback from you and your team. Thanks!
Thanks so much Heli!! i'll Review and will let you know how things go
After reviewing, this looks great. I really like the validation. I understand that part of the differences in the validation stem from potential differences in aggregation from CT to ZCTA. Let me know if my understanding below is correct:
A few notes (regardless of the 1 vs 2 thing above):
Thanks again!!
Hi Usama, thanks for the feedback!
In the section "Aggregating ct data to ZCTA level" i understood you are doing 1, but in the code for "Percentile ranking (“RPL_xx”) by theme" i see option 2. Which one is happening?
You're right about how hSVI works, including the part that I used two aggregating methods. I did the sum E_variables and mean EP_variables without further computing percentiles and SVI, and I also took the mean of the percentiles separately. The purpose was to look at not only the aggregated cSVI, but also the individual variables in terms of their correlation with our calculation results. So by your standard, I was using option2 for cSVI aggregation from CT to ZCTA, and additionally I was using (part of) option1 for variable aggregation from CT to ZCTA. I'd be happy to do option1 for cSVI aggregation too if you'd like.
the ZCTA vs CT validation, while nice, may be complicated to actually conduct properly.
I completely agree with you about how tricky ZCTA vs CT validation can be, and the point about the ZCTA-specific weights makes a lot of sense. I got quite frustrated while trying to do the aggregation, but wanted to include them and hear your thoughts.
It'd be good to replicate this at the county levle and compare hSVI with cSVI at the county level
Here is a new report where I added the comparison between hSVI and cSVI at the county level (2018, 2020) and census tract level (2020) .
Thanks again for your time and advice, and please let me know if you have other questions/suggestions.
Thank you! I know get it. so "method" 1 for comparing variables and "method" 2 for comparing the SVI itself. Part of the issue may be that an aggregation of percentiles may not be comparable with an aggregation of variables and then creating percentiles. This is known as the STA vs ATS dilemma: summarize (aggregate) then analyze (percentile calculation) = STA vs analyze (percentile calculation) then summarize (aggregate)=ATS. Your approach for validation of the SVI is ATS (you first calculate percentiles and then aggregate by taking the mean of percentiles)
County-level validation looks great. I think CT (usual acronym for tracts) and CTY (usual acronym for counties) validation is all you need to ensure you are doing the right things.
Now one last thing: I do observe a few very minor differences in both CT and CTY. What do you attribute them to?
Good to know. Thank you very much! Indeed a dilemma...
For the minor differences, I think they may be due to the number of decimal places in EP_variables (percentage). CDC version keeps one decimal place, whereas ours have more because I didn't specify it in the function (at the time I preferred to preserve as much information as possible). Here is a report with more details with some examples. I'd appreciate your insight, and we could adjust the function to make it more consistent with CDC's data if needed.
Thanks again for your help!
Great! It'd be great to try to "fully replicate" their approach by matching their number of decimals. Interesting that they don't include the caveat in the 2020 documentation...
Sounds good! This is a report where I used the updated function (with matching decimal places) to reproduce CDC SVI. Thanks again for your input!
If this looks good to you, I'll redo the zcta SVI (2017-2021, PA) using the new function and send them again.
Perfect!! Validation is 100% on point, so lets re-do them. thanks!
Sounds great! I'm attaching a zip folder with 5 updated tables of zcta-level SVI and a folder of CDC SVI tables and documentation for your reference (same as previously uploaded). I'd appreciate any further questions/suggestions. If they look good to you, please feel free to close the issue. Thanks again for your help with improving the result!
Thanks! All looks good,closing
Thanks!