Closed klenwell closed 9 months ago
Next step: analyze source file. Compare current one with last known good version.
Originally, I was thinking of doing something with sed/awk but I think I'll add a class method to the extract class instead. Some details I want to compare:
I discovered that there are now 3 OC zip codes reported where there used to be only one.
zip codes: {'92629', '92677', '92708'}
Most recent row for each:
# ['zipcode', 'wwtp_name', 'facility_name', 'sample_collect_date', 'lab_id', 'sample_id', 'site_id']
Laguna Niguel: ['92677', 'Regional Treatment Plant', 'Regional Treatment Plant', '09/25/2023', 'VLT', '158-230925', '06059-001-02-00-00']
Dana Point: ['92629', 'JB Latham Treatment Plant', 'JB Latham Treatment Plant', '09/25/2023', 'VLT', '157-230925', '06059-002-01-00-00']
Fountain Valley['92708', 'OCSD_P1', 'OC San (Orange County Sanitation District) Reclamation Plant No. 1', '12/29/2022', 'CAL3', 'OCSD_P1449240.318', '06059-003-01-00-00']
Maybe Dana Point is the most reliable source?
New discovery: the state CSV file includes virus readings not only for Covid but also other viruses like Norovirus and RSV. So I've updated my export to filter out only Covid data. I've also reformatted it to normalize the data rows and include more info for easier processing.
To test:
$ python app.py oc wastewater --mock
It appears that around Oct 1, CDPH changed the format of its wastewater data file. I rewrote my extract class to parse more sanely. Along the way I discovered that the fie contains data for different types of viruses, not just Covid. I modified my export to delineate the different OC sample sites and report types. Then I updated the extracts that depend on it to use it. All together, I believe the code is simpler and will be easier to modify, if needed, in future.
Started around start of the month. This is the error:
This is the trace: