tomwhite / covid-19-uk-data

Coronavirus (COVID-19) UK Historical Data
http://tom-e-white.com/covid-19-uk-data/
The Unlicense
162 stars 79 forks source link

Deprecate the covid-19-uk-data repo #68

Open tomwhite opened 4 years ago

tomwhite commented 4 years ago

I would like to deprecate this repo and encourage consumers to move to official upstream data sources. I'd like to stop updates in a month's time (1 August 2020).

When I started curating UK COVID-19 data in early March, numbers for people tested, confirmed cases, and deaths were only available on web pages, and did not provide a historical timeseries. That has now changed, with all the UK health agencies (except Northern Ireland, see below) providing machine-readable historical datasets. In fact, most of the datasets are now much richer than the data provided in this repository, including data such as number of hospitalizations and calls to helplines. For that reason, people who are working with COVID-19 data will typically be using the upstream sources anyway, to access this richer data.

As a case in point, the debate over Pillar 2 data has meant that the confirmed case numbers of England have become potentially misleading, so I have stopped providing them from this repository (#67). The data is still available from https://coronavirus.data.gov.uk/downloads/csv/coronavirus-cases_latest.csv, and in the last few days PHE have published week-level case numbers for England that contain Pillar 2 data (see the spreadsheet on this page: https://www.gov.uk/government/publications/national-covid-19-surveillance-reports). The hope is that they will publish this information at daily granularity, but until they do this illustrates the fact that working with COVID data is messy and necessarily involves working with multiple sources of data, even with efforts like this one.

The lack of machine-readable data for Northern Ireland is another unfortunate reality, and while I have been able to work around this problem in the past by using an undocumented backend API to get the case numbers for LGDs, this stopped working recently in such a way that it started reporting incorrect data. I feel it is wrong to rely on this undocumented API, given how it can silently break, and that people who want machine-readable data should make the case to the NI Department for Health (I was not successful in my request to them, see #63).

The data sources that this repo relies on are documented here: https://github.com/tomwhite/covid-19-uk-data#data-sources. Most consumers of the data should be able to move to these sources fairly easily. Most of them are in CSV or JSON format, at known locations, and with stable formats. There may be some challenges though - URLs that change every day, or parsing XLSX (for Wales) on some platforms - spring to mind, but these are the kind of things that I hope can be fixed by the community or the official providers.

nickcotter commented 4 years ago

Many thanks to you and everyone else who contributed to this repo.

gfaggio commented 4 years ago

Many thanks for all the help! Much appreciated. Without this repo, I would have been very hard for me to understand covid-19 data. Best, Giulia

robchallen commented 4 years ago

Hi Tom.

It makes sense, although sad to see it stop as it's been an island of sanity in the lunacy of our 4 nations approaches to reporting data streams.

One thing this repo offers (which the "official" sources don't) is the commit history of the time series. This will be useful in investigating issues in delays in reporting and recreating the data set as it was at particular points in time. For example, I think that delays reporting cases in the early days of the outbreak may have significantly affected the interpretation of the situation, and hence decisions around timing of the lockdown.

Obviously the main use case for the evolution of the historical time series is the early stage, which wouldn't be impacted by winding this up now, but my point is that the official sources do not provide the commit history in the same way and this makes your repository unique in the UK. We may find that the historical data around local outbreaks are similarly interesting in the future.

It's your call, and it will continue to be a useful resource either way.

Cheers, Rob.

tomwhite commented 4 years ago

Hi Rob,

Thanks for your comments. I agree that having a history of changes so people can look back and see how things were reported at the time is valuable. As you said it's especially interesting at the beginning of the pandemic.

I thought about this as a reason for continuing, but the change history is now being published for England, and for Scotland (on GitHub!) at least. Wales publishes a new spreadsheet every day, which may have revised historical figures in it (so doesn't retain the change history), and NI doesn't publish its data in machine readable form.

I think it would be fairly easy for someone to write a GH action (or similar) that downloads and archives the Wales data every day. It could also translate it into a set of CSVs to make it easier to consume.

Cheers, Tom

Jcamain commented 4 years ago

Thanks so much for all your help and assistance, the ever changing goal posts in the ways in which the different countries chose to deal with their data, make it available, change it every five mins, was a nightmare and your repository has been a god send!

gbugmann commented 4 years ago

Hello Tom, thanks for the good work. I gave me the feeling I knew what covid 19 was doing. Good for my mental health. I looked at the new official data https://coronavirus.data.gov.uk/downloads/csv/coronavirus-cases_latest.csv but thy only cover England... I attach my latest visualization. Good luck.

Guido Case_history.zip

tomwhite commented 4 years ago

Thanks Guido. BTW you can get data for the other nations (except NI) at the links listed here: https://github.com/tomwhite/covid-19-uk-data#data-sources

geeogi commented 4 years ago

Thanks for all your work Tom. Your data enabled us to build our application https://covidlive.co.uk. We'll be maintaining a limited fork of this repo at https://github.com/geeogi/covid-19-uk-data while we migrate to a new service.

Amol-Soneji commented 3 years ago

Hello Tom,

I think it is still possible to keep this project code relevant by slightly changing its purpose. Instead of just dealing with UK, if this project deals with global statistics, this project may still be useful. There are many countries that still do not provide easy machine readable data yet.