valorumdata / cmdc-tools

3 stars 4 forks source link

Los Angeles county, CA reporting 0 for positive_tests_total #87

Closed mikelehen closed 4 years ago

mikelehen commented 4 years ago

We would like it to either be the correct value or else be absent so that we don't try to use it.

This may be blocking for us soon as we switch our code to prioritize Valorum over Corona Data Scraper.

sglyon commented 4 years ago

hey @mikelehen thanks for pinging us here.

I couldn't replicate the issue using either the client lib:

In [1]: import cmdc

In [2]: c = cmdc.Client()

In [3]: df = c.covid_us(location=6037, variable=["positive_tests_total"])

In [4]: df = c.covid_us(location=6037, variable=["positive_tests_total"]).fetch()

In [5]: df
Out[5]:
variable  location         dt  positive_tests_total
0             6037 2020-03-10                   176
1             6037 2020-03-11                   512
2             6037 2020-03-12                  1313
3             6037 2020-03-13                  2305
4             6037 2020-03-14                  2921
..             ...        ...                   ...
122           6037 2020-07-10               1807301
123           6037 2020-07-11               1831675
124           6037 2020-07-12               1839431
125           6037 2020-07-13               1854303
126           6037 2020-07-14               1855779

[127 rows x 3 columns]

Or raw api:

~ via C base on ☁️  us-east-1
❯ http "https://api.covid.valorum.ai/covid_us?location=eq.6037&variable=eq.positive_tests_total&order=dt.desc&limit=10"
HTTP/1.1 200 OK
Cache-Control: private
Content-Encoding: gzip
Content-Length: 180
Content-Location: /covid_us?limit=10&location=eq.6037&order=dt.desc&variable=eq.positive_tests_total
Content-Profile: api
Content-Range: 0-9/*
Content-Type: application/json; charset=utf-8
Date: Fri, 17 Jul 2020 01:38:06 GMT
Server: Caddy
Server: Google Frontend
Vary: Accept-Encoding
Via: kong/2.0.4
Www-Authenticate: Key realm="kong"
X-Kong-Proxy-Latency: 0
X-Kong-Upstream-Latency: 175

[
    {
        "dt": "2020-07-14",
        "location": 6037,
        "value": 1855779,
        "variable": "positive_tests_total"
    },
    {
        "dt": "2020-07-13",
        "location": 6037,
        "value": 1854303,
        "variable": "positive_tests_total"
    },
    {
        "dt": "2020-07-12",
        "location": 6037,
        "value": 1839431,
        "variable": "positive_tests_total"
    },
    {
        "dt": "2020-07-11",
        "location": 6037,
        "value": 1831675,
        "variable": "positive_tests_total"
    },
    {
        "dt": "2020-07-10",
        "location": 6037,
        "value": 1807301,
        "variable": "positive_tests_total"
    },
    {
        "dt": "2020-07-09",
        "location": 6037,
        "value": 1775533,
        "variable": "positive_tests_total"
    },
    {
        "dt": "2020-07-08",
        "location": 6037,
        "value": 1740672,
        "variable": "positive_tests_total"
    },
    {
        "dt": "2020-07-07",
        "location": 6037,
        "value": 1703657,
        "variable": "positive_tests_total"
    },
    {
        "dt": "2020-07-06",
        "location": 6037,
        "value": 1677544,
        "variable": "positive_tests_total"
    },
    {
        "dt": "2020-07-05",
        "location": 6037,
        "value": 1652339,
        "variable": "positive_tests_total"
    }
]

Could it perhaps have resolved itself??

sglyon commented 4 years ago

oops! I just realized that my data was one day behind (the LA county dashboard isn't working well today, even when I visit it on a browser).

I did try another query with the API and I do see that the result has NaN instead of zero for postiive_tests_total on July 15:

In [1]: import cmdc

In [2]: c = cmdc.Client()

In [3]: df = c.covid_us(location=6037, variable=["positive_tests_total", "cases_total"]).fetch()

In [4]: df
Out[4]:
variable         dt  location  cases_total  positive_tests_total
0        2020-01-22      6037          0.0                   NaN
1        2020-01-23      6037          0.0                   NaN
2        2020-01-24      6037          0.0                   NaN
3        2020-01-25      6037          0.0                   NaN
4        2020-01-26      6037          1.0                   NaN
..              ...       ...          ...                   ...
171      2020-07-11      6037     133659.0             1831675.0
172      2020-07-12      6037     134391.0             1839431.0
173      2020-07-13      6037     135387.0             1854303.0
174      2020-07-14      6037     135580.0             1855779.0
175      2020-07-15      6037     143343.0                   NaN

[176 rows x 4 columns]

Perhaps there is something in the CAN code that replaces NaN with 0?

mikelehen commented 4 years ago

@sglyon SORRY! I complained about the wrong field. It's negative_tests_total where we are seeing 0 show up.

I'll be more careful and try to actually include API-level repro instructions going forward 😬 ...

$ curl -X GET "https://api.covid.valorum.ai/covid_historical?vintage=eq.2020-07-14&fips=eq.06037&variable=eq.negative_tests_total" -H "Accept: application/json, application/vnd.pgrst.object+json, text/csv" -H "Range-Unit: items" 
[
...
 {"vintage":"2020-07-14","dt":"2020-07-08","fips":6037,"variable":"negative_tests_total","value":0},
 {"vintage":"2020-07-14","dt":"2020-07-09","fips":6037,"variable":"negative_tests_total","value":0},
 {"vintage":"2020-07-14","dt":"2020-07-10","fips":6037,"variable":"negative_tests_total","value":0},
 {"vintage":"2020-07-14","dt":"2020-07-11","fips":6037,"variable":"negative_tests_total","value":0}
]
sglyon commented 4 years ago

Thanks @mikelehen for sticking with us.

I just found the issue -- we were setting posistive tests equal to total tests, then computing negative = total - positive. We've fixed it so we properly set positive = positive, so now the identity negative = total-positive makes sense.

ref: https://github.com/valorumdata/cmdc-tools/commit/d5967f12d0a1aff487b9b07e80ea8078e09fff0e

repro:

~ via C base on ☁️  us-east-1
❯ curl -X GET  "https://api.covid.valorum.ai/covid_us?order=dt.desc&location=eq.6037&limit=10"
[{"dt":"2020-07-18","location":6037,"variable":"negative_tests_total","value":1870140},
 {"dt":"2020-07-18","location":6037,"variable":"positive_tests_total","value":149596},
 {"dt":"2020-07-18","location":6037,"variable":"hospital_beds_in_use_covid_confirmed","value":2232},
 {"dt":"2020-07-18","location":6037,"variable":"hospital_beds_in_use_covid_suspected","value":608},
 {"dt":"2020-07-18","location":6037,"variable":"icu_beds_in_use_covid_confirmed","value":585},
 {"dt":"2020-07-18","location":6037,"variable":"icu_beds_in_use_covid_total","value":666},
 {"dt":"2020-07-18","location":6037,"variable":"icu_beds_in_use_covid_suspected","value":81},
 {"dt":"2020-07-18","location":6037,"variable":"hospital_beds_capacity_count","value":23972},
 {"dt":"2020-07-18","location":6037,"variable":"hospital_beds_in_use_covid_total","value":2840},
 {"dt":"2020-07-18","location":6037,"variable":"deaths_total","value":3836}]%