Closed mikelehen closed 4 years ago
hey @mikelehen thanks for pinging us here.
I couldn't replicate the issue using either the client lib:
In [1]: import cmdc
In [2]: c = cmdc.Client()
In [3]: df = c.covid_us(location=6037, variable=["positive_tests_total"])
In [4]: df = c.covid_us(location=6037, variable=["positive_tests_total"]).fetch()
In [5]: df
Out[5]:
variable location dt positive_tests_total
0 6037 2020-03-10 176
1 6037 2020-03-11 512
2 6037 2020-03-12 1313
3 6037 2020-03-13 2305
4 6037 2020-03-14 2921
.. ... ... ...
122 6037 2020-07-10 1807301
123 6037 2020-07-11 1831675
124 6037 2020-07-12 1839431
125 6037 2020-07-13 1854303
126 6037 2020-07-14 1855779
[127 rows x 3 columns]
Or raw api:
~ via C base on ☁️ us-east-1
❯ http "https://api.covid.valorum.ai/covid_us?location=eq.6037&variable=eq.positive_tests_total&order=dt.desc&limit=10"
HTTP/1.1 200 OK
Cache-Control: private
Content-Encoding: gzip
Content-Length: 180
Content-Location: /covid_us?limit=10&location=eq.6037&order=dt.desc&variable=eq.positive_tests_total
Content-Profile: api
Content-Range: 0-9/*
Content-Type: application/json; charset=utf-8
Date: Fri, 17 Jul 2020 01:38:06 GMT
Server: Caddy
Server: Google Frontend
Vary: Accept-Encoding
Via: kong/2.0.4
Www-Authenticate: Key realm="kong"
X-Kong-Proxy-Latency: 0
X-Kong-Upstream-Latency: 175
[
{
"dt": "2020-07-14",
"location": 6037,
"value": 1855779,
"variable": "positive_tests_total"
},
{
"dt": "2020-07-13",
"location": 6037,
"value": 1854303,
"variable": "positive_tests_total"
},
{
"dt": "2020-07-12",
"location": 6037,
"value": 1839431,
"variable": "positive_tests_total"
},
{
"dt": "2020-07-11",
"location": 6037,
"value": 1831675,
"variable": "positive_tests_total"
},
{
"dt": "2020-07-10",
"location": 6037,
"value": 1807301,
"variable": "positive_tests_total"
},
{
"dt": "2020-07-09",
"location": 6037,
"value": 1775533,
"variable": "positive_tests_total"
},
{
"dt": "2020-07-08",
"location": 6037,
"value": 1740672,
"variable": "positive_tests_total"
},
{
"dt": "2020-07-07",
"location": 6037,
"value": 1703657,
"variable": "positive_tests_total"
},
{
"dt": "2020-07-06",
"location": 6037,
"value": 1677544,
"variable": "positive_tests_total"
},
{
"dt": "2020-07-05",
"location": 6037,
"value": 1652339,
"variable": "positive_tests_total"
}
]
Could it perhaps have resolved itself??
oops! I just realized that my data was one day behind (the LA county dashboard isn't working well today, even when I visit it on a browser).
I did try another query with the API and I do see that the result has NaN instead of zero for postiive_tests_total on July 15:
In [1]: import cmdc
In [2]: c = cmdc.Client()
In [3]: df = c.covid_us(location=6037, variable=["positive_tests_total", "cases_total"]).fetch()
In [4]: df
Out[4]:
variable dt location cases_total positive_tests_total
0 2020-01-22 6037 0.0 NaN
1 2020-01-23 6037 0.0 NaN
2 2020-01-24 6037 0.0 NaN
3 2020-01-25 6037 0.0 NaN
4 2020-01-26 6037 1.0 NaN
.. ... ... ... ...
171 2020-07-11 6037 133659.0 1831675.0
172 2020-07-12 6037 134391.0 1839431.0
173 2020-07-13 6037 135387.0 1854303.0
174 2020-07-14 6037 135580.0 1855779.0
175 2020-07-15 6037 143343.0 NaN
[176 rows x 4 columns]
Perhaps there is something in the CAN code that replaces NaN with 0?
@sglyon SORRY! I complained about the wrong field. It's negative_tests_total where we are seeing 0
show up.
I'll be more careful and try to actually include API-level repro instructions going forward 😬 ...
$ curl -X GET "https://api.covid.valorum.ai/covid_historical?vintage=eq.2020-07-14&fips=eq.06037&variable=eq.negative_tests_total" -H "Accept: application/json, application/vnd.pgrst.object+json, text/csv" -H "Range-Unit: items"
[
...
{"vintage":"2020-07-14","dt":"2020-07-08","fips":6037,"variable":"negative_tests_total","value":0},
{"vintage":"2020-07-14","dt":"2020-07-09","fips":6037,"variable":"negative_tests_total","value":0},
{"vintage":"2020-07-14","dt":"2020-07-10","fips":6037,"variable":"negative_tests_total","value":0},
{"vintage":"2020-07-14","dt":"2020-07-11","fips":6037,"variable":"negative_tests_total","value":0}
]
Thanks @mikelehen for sticking with us.
I just found the issue -- we were setting posistive tests equal to total tests, then computing negative = total - positive. We've fixed it so we properly set positive = positive, so now the identity negative = total-positive makes sense.
ref: https://github.com/valorumdata/cmdc-tools/commit/d5967f12d0a1aff487b9b07e80ea8078e09fff0e
repro:
~ via C base on ☁️ us-east-1
❯ curl -X GET "https://api.covid.valorum.ai/covid_us?order=dt.desc&location=eq.6037&limit=10"
[{"dt":"2020-07-18","location":6037,"variable":"negative_tests_total","value":1870140},
{"dt":"2020-07-18","location":6037,"variable":"positive_tests_total","value":149596},
{"dt":"2020-07-18","location":6037,"variable":"hospital_beds_in_use_covid_confirmed","value":2232},
{"dt":"2020-07-18","location":6037,"variable":"hospital_beds_in_use_covid_suspected","value":608},
{"dt":"2020-07-18","location":6037,"variable":"icu_beds_in_use_covid_confirmed","value":585},
{"dt":"2020-07-18","location":6037,"variable":"icu_beds_in_use_covid_total","value":666},
{"dt":"2020-07-18","location":6037,"variable":"icu_beds_in_use_covid_suspected","value":81},
{"dt":"2020-07-18","location":6037,"variable":"hospital_beds_capacity_count","value":23972},
{"dt":"2020-07-18","location":6037,"variable":"hospital_beds_in_use_covid_total","value":2840},
{"dt":"2020-07-18","location":6037,"variable":"deaths_total","value":3836}]%
We would like it to either be the correct value or else be absent so that we don't try to use it.
This may be blocking for us soon as we switch our code to prioritize Valorum over Corona Data Scraper.