sfbrigade / data-covid19-sfbayarea

Manual and automated processes of sourcing data for the stop-covid19-sfbayarea project
MIT License
8 stars 10 forks source link

Sonoma County Genders Table is Gone #213

Closed Mr0grog closed 3 years ago

Mr0grog commented 3 years ago

Describe the bug

The Sonoma County data scraper is currently failing with the following error:

Sonoma county failed: The header "Cases by Gender" no longer corresponds to a section
Traceback (most recent call last):
 File "/home/runner/work/stop-covid19-sfbayarea/stop-covid19-sfbayarea/scraper/scraper_data.py", line 30, in main
   out[county] = data_scrapers.scrapers[county].get_county()
 File "/home/runner/work/stop-covid19-sfbayarea/stop-covid19-sfbayarea/scraper/covid19_sfbayarea/data/sonoma.py", line 321, in get_county
   hist_cases, total_tests, cases_by_source, cases_by_age, cases_by_gender, cases_by_race = get_table_tags(sonoma_soup)
 File "/home/runner/work/stop-covid19-sfbayarea/stop-covid19-sfbayarea/scraper/covid19_sfbayarea/data/sonoma.py", line 308, in get_table_tags
   return [get_table(header, soup) for header in headers]
 File "/home/runner/work/stop-covid19-sfbayarea/stop-covid19-sfbayarea/scraper/covid19_sfbayarea/data/sonoma.py", line 308, in <listcomp>
   return [get_table(header, soup) for header in headers]
 File "/home/runner/work/stop-covid19-sfbayarea/stop-covid19-sfbayarea/scraper/covid19_sfbayarea/data/sonoma.py", line 29, in get_table
   tables = get_section_by_title(header, soup).find_all('table')
 File "/home/runner/work/stop-covid19-sfbayarea/stop-covid19-sfbayarea/scraper/covid19_sfbayarea/data/sonoma.py", line 20, in get_section_by_title
   raise FormatError('The header "{0}" no longer corresponds to a section'.format(header))
covid19_sfbayarea.errors.FormatError: The header "Cases by Gender" no longer corresponds to a section

It appears they’ve removed gender information entirely from the main dashboard (https://socoemergency.org/emergency/novel-coronavirus/coronavirus-cases/) and from all the ArcGIS dashboards (https://experience.arcgis.com/experience/1edbb41952a8417385652279305e878d/page/page_11/), too.

Digging into the ArcGIS server under the hood, there is a dataset that lists it, BUT the totals don’t match up with everything else, and some of the other measures in that dataset are clearly way off. It doesn’t appear that any of the dashboards are using this dataset, so it’s probably just old and deprecated: https://services1.arcgis.com/P5Mv5GY5S66M8Z1Q/ArcGIS/rest/services/NCOV_Cases_Sonoma_County_Statistics/FeatureServer

Best solution here is probably to stop tracking gender info for Sonoma, unfortunately. :(