tomwhite / covid-19-uk-data

Coronavirus (COVID-19) UK Historical Data
http://tom-e-white.com/covid-19-uk-data/
The Unlicense
162 stars 79 forks source link

Oddball NaN entry for 2020-04-08,Scotland,S08000028,Western Isles #31

Closed timday closed 4 years ago

timday commented 4 years ago

Not sure whether it's just being true to the upstream or something else going wrong somewhere. Don't think I've seen any NaNs before in this stuff. Different enough from the other "to be confirmed" / "unknown location" type things to be worth remarking on anyway:

$ grep S08000028 data/covid-19-cases-uk.csv 
2020-03-01,Scotland,S08000028,Western Isles,0
2020-03-02,Scotland,S08000028,Western Isles,0
2020-03-03,Scotland,S08000028,Western Isles,0
2020-03-04,Scotland,S08000028,Western Isles,0
2020-03-05,Scotland,S08000028,Western Isles,0
2020-04-01,Scotland,S08000028,Western Isles,3
2020-04-02,Scotland,S08000028,Western Isles,3
2020-04-03,Scotland,S08000028,Western Isles,3
2020-04-04,Scotland,S08000028,Western Isles,3
2020-04-05,Scotland,S08000028,Western Isles,4
2020-04-06,Scotland,S08000028,Western Isles,4
2020-04-07,Scotland,S08000028,Western Isles,4
2020-04-08,Scotland,S08000028,Western Isles,NaN

Oh just noticed there's another one too

$ grep S08000025 data/covid-19-cases-uk.csv 
2020-03-01,Scotland,S08000025,Orkney,0
2020-03-02,Scotland,S08000025,Orkney,0
2020-03-03,Scotland,S08000025,Orkney,0
2020-03-04,Scotland,S08000025,Orkney,0
2020-03-05,Scotland,S08000025,Orkney,0
2020-04-01,Scotland,S08000025,Orkney,2
2020-04-02,Scotland,S08000025,Orkney,2
2020-04-03,Scotland,S08000025,Orkney,2
2020-04-04,Scotland,S08000025,Orkney,4
2020-04-05,Scotland,S08000025,Orkney,4
2020-04-06,Scotland,S08000025,Orkney,4
2020-04-07,Scotland,S08000025,Orkney,4
2020-04-08,Scotland,S08000025,Orkney,NaN
tomwhite commented 4 years ago

It's a new change meaning less than 5, see https://www.gov.scot/publications/coronavirus-covid-19-tests-and-cases-in-scotland/

nanjizal commented 4 years ago

yep this is breaking my data, it's not ideal to be fixing the feed processor. I have been mapping '1 to 4' to 3, I think I would prefer if feasible '1 to 5' and then I can add a check and replace with 3, would it be feasible just to use previous days value in these cases. NaN or null is ugly.

timday commented 4 years ago

Ah, the equivalent of those "1 to 4" strings that appear in the English data then. (But arguably a bit more computer friendly, at least if your code deals with NaNs sensibly). I have a vague idea from somewhere that ranges are given for low counts because of data protection/privacy/anonymisation type concerns rather than any real uncertainty about the actual number. Anyway, thanks for the pointer to the gov.scot source. Closing.