klenwell / covid-19

Python command-line application to collect and analyze COVID-19 data.
1 stars 0 forks source link

JSON issue when trying to extract effective reproduction rate (Rt) data. #32

Closed klenwell closed 3 years ago

klenwell commented 3 years ago

I suspect something changed in the way the embedded data is structured.

Data for OC comes from here:

The error:

$ python app.py oc daily
Traceback (most recent call last):
  File "app.py", line 14, in <module>
    app.run()
  File "/home/klenwell/pyenv/versions/covid-19/lib/python3.8/site-packages/cement/core/foundation.py", line 916, in run
    return_val = self.controller._dispatch()
  File "/home/klenwell/pyenv/versions/covid-19/lib/python3.8/site-packages/cement/ext/ext_argparse.py", line 808, in _dispatch
    return func()
  File "/home/klenwell/projects/covid-19/covid_app/controllers/oc_controller.py", line 22, in daily
    service = OCHealthService.export_daily_csv()
  File "/home/klenwell/projects/covid-19/covid_app/services/oc_health_service.py", line 36, in export_daily_csv
    service.to_csv()
  File "/home/klenwell/projects/covid-19/covid_app/services/oc_health_service.py", line 129, in to_csv
    writer.writerow(self.data_to_csv_row(dated))
  File "/home/klenwell/projects/covid-19/covid_app/services/oc_health_service.py", line 143, in data_to_csv_row
    self.rt_extract.get(dated),
  File "/home/klenwell/pyenv/versions/covid-19/lib/python3.8/functools.py", line 966, in __get__
    val = self.func(instance)
  File "/home/klenwell/projects/covid-19/covid_app/services/oc_health_service.py", line 83, in rt_extract
    return Covid19ProjectionsExtract.oc_effective_reproduction()
  File "/home/klenwell/projects/covid-19/covid_app/extracts/covid19_projections.py", line 23, in oc_effective_reproduction
    daily_rts_dict = extract.filter_rts(html)
  File "/home/klenwell/projects/covid-19/covid_app/extracts/covid19_projections.py", line 52, in filter_rts
    plot_data = json.loads(data_str)
  File "/home/klenwell/pyenv/versions/3.8.1/lib/python3.8/json/__init__.py", line 357, in loads
    return _default_decoder.decode(s)
  File "/home/klenwell/pyenv/versions/3.8.1/lib/python3.8/json/decoder.py", line 340, in decode
    raise JSONDecodeError("Extra data", s, end)
json.decoder.JSONDecodeError: Extra data: line 1 column 96475 (char 96474)
klenwell commented 3 years ago

How to quickly catch and work around the issue:

$ git diff master..tmp-20200926-oc
diff --git a/covid_app/services/oc_health_service.py b/covid_app/services/oc_health_service.py
index 172a481..b32dc1a 100644
--- a/covid_app/services/oc_health_service.py
+++ b/covid_app/services/oc_health_service.py
@@ -80,7 +80,11 @@ class OCHealthService:

     @cached_property
     def rt_extract(self):
-        return Covid19ProjectionsExtract.oc_effective_reproduction()
+        try:
+            return Covid19ProjectionsExtract.oc_effective_reproduction()
+        except Exception as e:
+            print("There was an error with Rt extract: {}".format(e))
+            return {}

     @cached_property
     def daily_csv_rows(self):
diff --git a/covid_app/services/us_health_service.py b/covid_app/services/us_health_service.py
index 91e3dfc..1ce4e30 100644
--- a/covid_app/services/us_health_service.py
+++ b/covid_app/services/us_health_service.py
@@ -55,7 +55,11 @@ class USHealthService:

     @cached_property
     def rt_rates(self):
-        return Covid19ProjectionsExtract.us_effective_reproduction()
+        try:
+            return Covid19ProjectionsExtract.us_effective_reproduction()
+        except Exception as e:
+            print("There was an error with Rt extract: {}".format(e))
+            return {}

This will allow you to produce the CSV file with the Rt column blank.

klenwell commented 3 years ago

Resolved

Fixed by PR #33.

Underlying issue: parser uses a sort of binary search approach to extract data. It was relying on a linebreak in web page HTML that disappeared. So I found a more reliable marker.