tomwhite / covid-19-uk-data

Coronavirus (COVID-19) UK Historical Data
http://tom-e-white.com/covid-19-uk-data/
The Unlicense
162 stars 79 forks source link

Dataset; covid-19-uk-data/data/covid-19-cases-uk.csv has non numerical values in some of the total cases field #29

Closed Jcamain closed 4 years ago

Jcamain commented 4 years ago

@tomwhite - Tom is there any chance you can change the '1 to 4' values in the Total Cases field? Your dataset is the best one I've found (I've been grabbing the NHSEngland from the ArcGis dashboard, but it doesn't include Wales and Scotland), so would like to reference yours instead - I work for Qlik and we are helping out where we can - our software allows for a huge amount of analysis in to the data and your set is perfect. Any additional mapping fields etc.. and data I'm working on happy to share, and the dashboard.

tomwhite commented 4 years ago

Hi @Jcamain - thanks for raising this, glad you are finding the dataset useful.

What would you change the '1 to 4' values to? This is how PHE published them right at the beginning, before switching to actual case numbers. I agree it's annoying, but the simplest thing is probably to filter the dataset to drop dates up to 2020-03-05 (the last date with '1 to 4'-style values in it) for the cases file.

timday commented 4 years ago

FWIW, in my (python) code reading the csv file, I map all the cells which I expect to contain numbers through a function which is currently

def value(s):
    if s=='1 to 4':
        return 2.5
    else:
        return float(s)

just on the grounds that if all of 1,2,3 & 4 are equally likely, that's the average. (And because if any more odd things like that turned up which would break the conversion to float, I can intercept them there too. Fortunately it's been pure numbers all the way since.)

Pretty much standard operating procedure for real world data IMHO. If it's not something like "1 to 4" there'll be the odd "n/a" or "?" or blank cell for your processing to choke on. You always need a way of dealing with them.

Jcamain commented 4 years ago

Thanks guys! To be fair, I thought it might cause some issues, but everything is working as it should! Will post up my dashboard when I'm finished, just in case you find it interesting, J