usds / justice40-tool

A tool to identify disadvantaged communities due to environmental, socioeconomic and health burdens
https://screeningtool.geoplatform.gov/
Creative Commons Zero v1.0 Universal
132 stars 42 forks source link

Bug in Puerto Rico data #2175

Open KameronKerger opened 1 year ago

KameronKerger commented 1 year ago

Describe the bug There is a bug in the data for Puerto Rico (in just the spreadsheet?). When I was doing some "data science", I noticed that there are tracts that have 0 thresholds exceeded but are exceeding 1 category. this is not possible since you need to exceed at least 1 threshold to exceed any categories. Hmmmm. On further investigation, this only appears to be happening to tracts in Puerto Rico. 35 tracts in PR to be exact. All of these tracts are identified as disadvantaged. They are all over threshold for low income. When I count the TRUEs on the data burdens I get 30, meaning 30 tracts should have at least 1 threshold exceeded. What is puzzling is why is this not 35? Are 5 tracts being identified as disadvantaged that shouldn't be? is it just a bug in the spreadsheet? is this also happening on the front end? why is this only happening in PR? Could it have something to do with the removal of linguistic isolation? Meaning maybe those values are still getting pushed some how? But why is it 0 thresholds? This is case just like the extra 13 tracts!

note: for historical context, we used to show the thresholds exceeded on the UI but took it off because it is too confusing.

To Reproduce Steps to reproduce the behavior:

  1. Go to 'attached spreadsheet'
  2. See error

Expected behavior A clear and concise description of what you expected to happen.

Screenshots If applicable, add screenshots to help explain your problem. Attached sheet with the 35 tracts from 1.0 PR_Bug_data.xlsx

Desktop (please complete the following information):

Smartphone (please complete the following information):

Additional context Add any other context about the problem here.

sampowers-usds commented 1 year ago

@KameronKerger, I dug a bit deeper into this with data from the backend. On that side, I have intermediate signals for each of the categories (i.e. a precalculated column for "ClimateChange" and "Workforce"). It doesn't appear that those signals make it into the communities list that you were working with. Using that data, I wasn't able to replicate the 5 that you found that were deemed disadvantaged but that didn't trigger any of the category thresholds. I think its possible there are PR/Island-specific threshold columns that get generated that aren't making it into the front end but that get used in the back end. I can look further into that. But I don't think that we are currently flagging anything as disadvantaged that shouldn't be designated that way.

On the question of "why are there tracts that pass 0 thresholds but pass a category?" I've discovered that its actually a bit deeper than that. There are ~398 tracts (all in PR) that pass at least one more category then they pass thresholds for. It seems mostly in the workforce and housing categories. I checked in the backend and I think I know why its happening (it seems like the adding mechanism isn't really working for island areas). But luckily that field is calculated at the end and doesn't really feed anything. I think we can assume that its an issue in the spreadsheet and not in the actual calculation of disadvantage.

KameronKerger commented 1 year ago

Ok good to know @Powers, Samuel D. @.***>. Let’s leave this one for the contractors.

From: "sam powers (usds)" @.> Reply-To: usds/justice40-tool @.> Date: Tuesday, March 14, 2023 at 4:21 PM To: usds/justice40-tool @.> Cc: "Kerger, Kameron N. EOP/OMB" @.>, Mention @.***> Subject: Re: [usds/justice40-tool] Bug in Puerto Rico data (Issue #2175)

@KameronKergerhttps://github.com/KameronKerger, I dug a bit deeper into this with data from the backend. On that side, I have intermediate signals for each of the categories (i.e. a precalculated column for "ClimateChange" and "Workforce"). It doesn't appear that those signals make it into the communities list that you were working with. Using that data, I wasn't able to replicate the 5 that you found that were deemed disadvantaged but that didn't trigger any of the category thresholds. I think its possible there are PR/Island-specific threshold columns that get generated that aren't making it into the front end but that get used in the back end. I can look further into that. But I don't think that we are currently flagging anything as disadvantaged that shouldn't be designated that way.

On the question of "why are there tracts that pass 0 thresholds but pass a category?" I've discovered that its actually a bit deeper than that. There are ~398 tracts (all in PR) that pass at least one more category then they pass thresholds for. It seems mostly in the workforce and housing categories. I checked in the backend and I think I know why its happening (it seems like the adding mechanism isn't really working for island areas). But luckily that field is calculated at the end and doesn't really feed anything. I think we can assume that its an issue in the spreadsheet and not in the actual calculation of disadvantage.

— Reply to this email directly, view it on GitHubhttps://github.com/usds/justice40-tool/issues/2175#issuecomment-1468997281, or unsubscribehttps://github.com/notifications/unsubscribe-auth/ASQ2OVBCA45BUWOSM2I76XLW4D4OTANCNFSM6AAAAAAVHKJ5IU. You are receiving this because you were mentioned.Message ID: @.***>

KameronKerger commented 11 months ago

needs discussion