Closed quixote79 closed 4 years ago
Thanks for the kind words and for flagging these issues.
There's no coordinates information for France. Please check.
I just added France's provinces, but didn't have time to fill the metadata which is a manual process.
And there is an area in China with the same code. Shaanxi(SN), Shanxi(SN)
I had not noticed the error with Sha(a?)nxi, indeed there is a duplicate code but they are distinct provinces.
Both errors with France and China metadata should be easy enough to fix!
It would be even better if you could let us know the update time by data source.
This could be interpreted in a number of ways, I'm assuming you are interested in the "freshness" of the data:
data.csv
-- maybe useful for auditing purposes, but doesn't tell us much about when the data was actually collected and whether it's currentdata.csv
was updated -- also useful for auditing purposes, but seems like a bad proxy metric to measure the "freshness" of the dataTo make things more complicated, there are a number of issues that have to be accounted for:
Can you help me understand how you are trying to use an "UpdateTime" column, if we were to add one?
I think the "freshness" of the data is sufficient. I mean, I want to know when you update. :)
I too would like to know when the data has been updated.
I am developing a web graphing tool and do not want to keep downloading data if it has not changed.
Currently I am using PHP and get_headers to check "Content-Length" and only load the file if that has changed.
It would be ok for my purpose to have the "UpdateTime" in the headers, but those without access to the file headers do not benefit. A separate file named 'UpdateTime.txt' would be good.
@Mahks you can get the last date that a commit was pushed to the repo using GitHub's API, see this for reference.
@quixote79 let me see if I can add a column "LastUpdated" which will represent when I last fetched the data for that particular row. So, for most rows, it will be updated every time since I'm retrieving historical data daily hoping to catch any potential corrections.
@quixote79 I looked into this, but I ran into a major problem. Currently, even though we are loading all historical data for almost countries, only a few rows get updated so you can visually inspect the changes. For example, look at this commit and click "Load diff".
By adding a LastUpdated
column, almost every single row gets updated each time. Then the diff of the file would become useless and it's much harder to keep track of potentially bad changes to the whole file. I would have to make major changes and merge the current data with the previous days' data every time, which defeats the purpose of retrieving historical data.
As a proxy, you can look at when the repo was last updated to tell the approximate "freshness" of the data. Currently, only 2 types of data are not being updated every time the repo gets updated: Spain and Italy country-level data for dates prior to March.
To see when the last update was done to the repo, you can use GitHub's API which outputs a JSON file: https://api.github.com/repos/open-covid-19/data/commits. The date will be the "freshness" time:
$.getJSON('https://api.github.com/repos/open-covid-19/data/commits', data => {
const lastUpdated = data[0].commit.committer.date;
}
Hopefully this is sufficient for your purposes, let me know if you have any issues with that idea.
@owahltinez Thanks for the stackoverflow reference.
Do you want accreditation as the data source / compiler on my page? If so what would you like? Name, link etc.
There is no need for citation / source, but if you want to add one you can say: Open COVID-19 Dataset and link to https://github.com/open-covid-19/data.
If you care to share a link to your page, I can add it to the README file in this repo.
Ok, thanks;
Link to site : http://www.starlords3k.com/covid19.php
@Mahks I'm completely blown away with your site, it is very cool!
I added a call-out to the README, please take a look and let me know if you are not happy with it.
Thanks for the call-out, glad you like it.
I am looking to add option to display dates of major events, like when a lock down began. Do you know of any data resource for that?
I would also like to find a data source for country statistics like level of medical care, doctors per population etc.
I would also like to find a data source for country statistics like level of medical care, doctors per population etc.
That was discussed briefly in this other issue. I think the best source of information for that should be Wikipedia, but the information is quite sparse. A few examples:
I don't think we should add too many columns of metadata to the output data CSV file here, but I wouldn't be opposed to adding the columns to the metadata CSV file as long as they are not expected to change often (like number of hospital beds or physicians).
For something that is expected to change often and has a date associated with it, like government measures or construction of field hospitals, I would love to find a reliable source of information that we can automatically scrape and add to the output data CSV file. Ideally that would be on Wikipedia, which is both easy to scrape and accessible to everyone.
Please review my poor-quality work.
Sorry I did not mean to suggest you add that other data to your file. I wanted it for my own use.
There is this api : https://apps.who.int/gho/athena/public_docs/examples.html#ex2 Lots of data, but does not seem to be real current.
I wondered why you had all the country data repeated in your csv file. Why not have 2 files, one for country, one for data per date? For my site I parse your file into JSON format : http://www.starlords3k.com/covid19_data.json Was only 200Kb until I added the worldwide numbers.
Please review my poor-quality work
@quixote79 I think that looks great! I don't think I've seen any other implementations using MapBox, everyone else appears to be using ArcGIS. Do you want a call-out in the README of this repo using that link?
Sorry I did not mean to suggest you add that other data to your file. I wanted it for my own use.
@Mahks well, I think it would be useful to everyone else. My goal for this repo is to serve as a source of well-maintained, high-quality data related to COVID-19.
There is this api : https://apps.who.int/gho/athena/public_docs/examples.html#ex2 Lots of data, but does not seem to be real current.
Thanks for the link. I'll take a look but I'm not very hopeful since I've been disappointed with the quality of the data coming from WHO.
I wondered why you had all the country data repeated in your csv file. Why not have 2 files, one for country, one for data per date? For my site I parse your file into JSON format : http://www.starlords3k.com/covid19_data.json Was only 200Kb until I added the worldwide numbers.
Yes, the current solution is not optimal. I think a lot of people are building maps and the latitude/longitude + population did not make the final output unreasonably large but I would not want to add any more fields. The main issue is that performing joins with non-tabulated formats like JSON is a pain, and some of the map tools people are using simply can't do that: you provide a JSON file and then use a GUI to create a map.
What do you think about this idea:
data.min.csv
and data.min.json
versions of the dataset with only Key
, Confirmed
and Deaths
columns, where Key
is CountryCode
for country-level data and ${CountryCode}_${RegionCode}
for region-level data.Key
column to metadata.csv
and publish that and a metadata.json
in the output folder, and over time add more columns for things that we can get information for a reasonable number of countries.It still would not be like the JSON file you linked, because you are pivoting the data away from the record format and using country codes as keys. I would not recommend creating your own JSON files since it will add yet another hop with additional delay -- there are currently 1-2 hops between the data coming from an authoritative source until it gets published in this repo, depending on the data source. I don't yet have it fully automated because some sources have not been 100% stable, which means that there could be multiple hours of delay right now. Once I can trust the data sources a little bit more, I hope to automatically update the data hourly.
I don't think you have to provide JSON format, as anyone needing that could easily convert from csv. The JSON file above was created just to demonstrate the structure.
I use your csv file and convert with PHP into JSON to send to the browser. So there is no "another hop with additional delay".
What I really meant I guess was, why not have 2 csv files? One with the country data and another with the date data. I don't know if that would be problematic for other uses. Does the data have to be in one file for some reason?
country_data.csv : AF,Afghanistan,,,33.93911,67.709953,38041754 ie; code,name,code,name,lat,long,pop date_data.csv: 2019-12-31,1,0 ie; date,cases,dead
It does not matter for me as I can parse the data regardless. Mostly wonder why the repetition of data. file is 400% larger as a result.
@quixote79 I think that looks great! I don't think I've seen any other implementations using MapBox, everyone else appears to be using ArcGIS. Do you want a call-out in the README of this repo using that link?
Yes, please :)
@quixote79 I added a call-out to your map in the README, please take a look!
@Mahks
I don't think you have to provide JSON format, as anyone needing that could easily convert from csv.
I don't think it's strictly a need, but the cost to do it is very low and reading JSON is much easier than reading CSV from Javascript applications that have no server-side component.
I use your csv file and convert with PHP into JSON to send to the browser. So there is no "another hop with additional delay".
Ah, I see. I thought you were converting to JSON on a schedule. That's an interesting architecture, and I'm guessing that you are caching the resulting JSON file (which is why you were asking about last updated time).
What I really meant I guess was, why not have 2 csv files? One with the country data and another with the date data. I don't know if that would be problematic for other uses. Does the data have to be in one file for some reason?
I will provide what you are asking for in the form of data_minimal.csv
and data_minimal.json
, I think it's a good idea for those that care about performance; even though data.json
is currently 2.6MB, which is manageable, it's only going to get bigger. data.csv
is still under 1MB which is much better, but as you point out it's simply not very efficient.
Unfortunately, having the latitude/longitude and population data is necessary for those who want to use a map tool that only accepts a single data source, and may not be very skilled programmers with their own server to do re-processing of the CSV / JSON files provided here. By providing both data.csv
and data_minimal.csv
we should be able to make everyone happy without incurring too much of a maintenance burden.
@Mahks please take a look at the new metadata and data_minimal CSV files, I think they should correspond to what you were suggesting. To join the datasets, simply use the column Key -- use an outer join since there will be multiple keys in the data_minimal dataset, one for each date where data is available.
I think your link to data_minimal.csv is pointing to the wrong file...
metadata.csv looks great
On Wed, 1 Apr 2020 at 02:48, Oscar Wahltinez notifications@github.com wrote:
@Mahks https://github.com/Mahks please take a look at the new metadata https://open-covid-19.github.io/data/metadata.csv and data_minimal https://open-covid-19.github.io/data/data_minimal.csv CSV files, I think they should correspond to what you were suggesting. To join the datasets, simply use the column Key -- use an outer join since there will be multiple keys in the data_minimal dataset, one for each date where data is available.
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/open-covid-19/data/issues/23#issuecomment-606837141, or unsubscribe https://github.com/notifications/unsubscribe-auth/ACQ5P2IUH7Y4T36C2DIUVVTRKJCH5ANCNFSM4LUDGQMA .
You changed the current data format!!!
It killed my site :(
On Wed, 1 Apr 2020 at 05:32, Mahks Doma mahks1@gmail.com wrote:
I think your link to data_minimal.csv is pointing to the wrong file...
metadata.csv looks great
On Wed, 1 Apr 2020 at 02:48, Oscar Wahltinez notifications@github.com wrote:
@Mahks https://github.com/Mahks please take a look at the new metadata https://open-covid-19.github.io/data/metadata.csv and data_minimal https://open-covid-19.github.io/data/data_minimal.csv CSV files, I think they should correspond to what you were suggesting. To join the datasets, simply use the column Key -- use an outer join since there will be multiple keys in the data_minimal dataset, one for each date where data is available.
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/open-covid-19/data/issues/23#issuecomment-606837141, or unsubscribe https://github.com/notifications/unsubscribe-auth/ACQ5P2IUH7Y4T36C2DIUVVTRKJCH5ANCNFSM4LUDGQMA .
Uh oh. What are you seeing differently? I'm looking into it right now
Edit: as far as I can tell, the format looks the same. The only difference is that I have added a column called Key
which should be a non-breaking change. Were you assuming the order of the columns by any chance?
https://raw.githubusercontent.com/open-covid-19/data/master/output/data.csv
Arrghh
On Wed, 1 Apr 2020 at 05:51, Oscar Wahltinez notifications@github.com wrote:
Uh oh. What are you seeing differently? I'm looking into it right now
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/open-covid-19/data/issues/23#issuecomment-606925855, or unsubscribe https://github.com/notifications/unsubscribe-auth/ACQ5P2ILOXCICHLO5FWWXT3RKJXVPANCNFSM4LUDGQMA .
That is the file I use to get the data. I can't access the other with PHP (so far as I have found)
On Wed, 1 Apr 2020 at 06:11, Mahks Doma mahks1@gmail.com wrote:
https://raw.githubusercontent.com/open-covid-19/data/master/output/data.csv
Arrghh
On Wed, 1 Apr 2020 at 05:51, Oscar Wahltinez notifications@github.com wrote:
Uh oh. What are you seeing differently? I'm looking into it right now
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/open-covid-19/data/issues/23#issuecomment-606925855, or unsubscribe https://github.com/notifications/unsubscribe-auth/ACQ5P2ILOXCICHLO5FWWXT3RKJXVPANCNFSM4LUDGQMA .
OK, I understand the problem. You are using the raw github output. Please use instead this link: https://open-covid-19.github.io/data/data.csv
do you use discord? https://discord.gg/TxUkaqy
On Wed, 1 Apr 2020 at 06:12, Mahks Doma mahks1@gmail.com wrote:
That is the file I use to get the data. I can't access the other with PHP (so far as I have found)
On Wed, 1 Apr 2020 at 06:11, Mahks Doma mahks1@gmail.com wrote:
https://raw.githubusercontent.com/open-covid-19/data/master/output/data.csv
Arrghh
On Wed, 1 Apr 2020 at 05:51, Oscar Wahltinez notifications@github.com wrote:
Uh oh. What are you seeing differently? I'm looking into it right now
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/open-covid-19/data/issues/23#issuecomment-606925855, or unsubscribe https://github.com/notifications/unsubscribe-auth/ACQ5P2ILOXCICHLO5FWWXT3RKJXVPANCNFSM4LUDGQMA .
Closing this issue, since all the problems listed here have been resolved. Feel free to open new issues to provide feedback or ask questions!
First of all, I would like to thank you for providing the data. I think it's the best quality of all.
There's no coordinates information for France. Please check.
And there is an area in China with the same code. Shaanxi(SN), Shanxi(SN) I'd appreciate it if you could check this as well.
It would be even better if you could let us know the update time by data source.