tomwhite / covid-19-uk-data

Coronavirus (COVID-19) UK Historical Data
http://tom-e-white.com/covid-19-uk-data/
The Unlicense
162 stars 79 forks source link

Add a check for data gaps #14

Closed tomwhite closed 4 years ago

tomwhite commented 4 years ago

Check no days are missing

timday commented 4 years ago

Don't know about gaps, but what happened on 2020-03-20? The England case numbers all seem to be exactly the same as they were on the day before (2020-03-19). But Scotland's numbers did update on the 20th. Noticed it while plotting this ghastly crime against eyeballs: cases-log A glance at the PHE's https://www.arcgis.com/apps/opsdashboard/index.html#/f94c3c90da5b4e9f9a0b19484dd4bb14 tracker page seems to show no hiatus in the curve's progress so it's not like there was no England data at all that day (unless they've done what I'm contemplating doing and fudged it with interpolated data for that day).

tomwhite commented 4 years ago

@timday thanks for reporting this - it looks like the file wasn't updated, or I downloaded an old one for that day. The good news is that PHE is publishing historical data now, so I can update it from there.

tomwhite commented 4 years ago

Fixed the issue for 20 March (and a similar one for 12 March) here: c46235f93c42812b7616e148a731e8f10ef29095

timday commented 4 years ago

I note Northern Ireland data has started appearing (which is great). But there's 26th & 27th... and then a gap for the 28th&29th, and then the 30th is present again. No Northern Ireland data at the weekend? Or just early-days glitches as the reporting operation ramps up?

tomwhite commented 4 years ago

It's for weekdays only - the report is shorter at the weekend.

robchallen commented 4 years ago

I notice a couple of days missing in a few regions - e.g. Hartlepool has no data on 30th April and 1st May. This is an upstream issue, and I've reported it to PHE. Hopefully when they fix it you can recover

https://github.com/PublicHealthEngland/coronavirus-dashboard/issues/153#issue-611739857

tomwhite commented 4 years ago

Thanks for reporting upstream @robchallen. This repo syncs with the complete upstream history every day, so it will be reflected here when it's fixed by PHE.

robchallen commented 4 years ago

PHE has responded with a wont fix. their rationale is that missing values imply no change from previous day, and this is an upstream issue for them too (sigh). I can adapt my pipeline to this but will probably impact others:

https://github.com/PublicHealthEngland/coronavirus-dashboard/issues/153#issuecomment-623406300

WDYT?

tomwhite commented 4 years ago

OK, sounds like it's a thing downstream analyses need to handle. This repo should change the data as little as possible.