Closed liuyanguu closed 2 years ago
Thanks for reporting, will investigate asap!
an update on this: there was a processing issue on our side between march and September 2022, and we've had to delete those dates (not yet reflected in OSF). Since we failed to capture snapshots of the original HTML tables we are going to try to request a special data extract from the data provider in South Korea. We do not know if this will work. The wayback machine only has ca 3 snapshots of the webpage during this period...
May I have a quick question: (I understand people are gradually getting less interested in COVID now) How often do we plan to update the datasets on OSF? There seems to be no more update after August:
And do we plan to update the database forever or will it stop at some point...?
Hi @liuyanguu We will now aim for monthly updates on OSF, but you can get more frequent updates from the new website which now has dates through Oct 26, 2022. However, OSF is the only place allowing a bulk download of the entire DB. Since you're only using national data, I think you could download it from the website, since that won't be too big. We're pretty close to figuring out how to make the request and download it from an R
session, but I don't have a handy code chunk to share just yet. The issue with OSF upload frequency is that we're hitting the storage ceiling. We could of course create a new project or component on OSF and keep up weekly updates, but then the URLs to the files would change, so I'm we're not entirely sure how to move forward with OSF.
@manalkamal can you chime in here regarding the initial issue on Korea data? We may be able to close this issue now.
@liuyanguu I've located a piece of test code for downloading from the API. Here's an example that downloads deaths and cases in 5-year age groups for national populations only and from1 January 2020 until 26 October 2022. If you set the last date to lubridate::today()
then that'll work too. The parameters are
dho
(original data) 0 or 1dh5
(harmonized 5 year age groups) 0 or 1dh10
(harmonized 10 year age groups) 0 or 1md
deaths 0 or 1mc
cases 0 or 1mt
tests 0 or 1mv
vaccinations 0 or 1 (only if dho
= 1)c
either an &-separated list of country names, using %20
for spaces, or all
for all-availablecsn
(subnational data) 0 or 1st
total sex (in these files we use t
instead of b
0 or 1sf
females (in these files we use t
instead of b
0 or 1sm
males (in these files we use t
instead of b
0 or 1d1
date 1d2
date 2tsv
do you want a tsv
file? 1 for yes, 0 for a csv
file.
test_url <- "https://www.coverage-db.org/home/api/?dho=0&dh5=1&dh10=0&md=1&mc=1&mt=0&mv=0&c=all&csn=0&st=1&sf=1&sm=1&d1=2020-01-01&d2=2022-10-26&tsv=0"
library(httr)
GET(test_url, write_disk("download.zip"))
Sometime soon we'll write a wrapper function to do this, so that parameters can be requested in a function call, and we'll update the getting-started page as well as the R
package. I suspect this using this approach for your work would be lighter weight moving forward, no?
Hello @timriffe & @liuyanguu ,
Yes, the initial problem is resolved. the data are good now.
Thanks Liu for raising this issue.
Sorry, I forgot to say your solution is very helpful and works well. Many thanks! And the problem is also solved. Many thanks, @manalkamal ! Happy new year!
@liuyanguu I've located a piece of test code for downloading from the API. Here's an example that downloads deaths and cases in 5-year age groups for national populations only and from1 January 2020 until 26 October 2022. If you set the last date to
lubridate::today()
then that'll work too. The parameters are
dho
(original data) 0 or 1dh5
(harmonized 5 year age groups) 0 or 1dh10
(harmonized 10 year age groups) 0 or 1md
deaths 0 or 1mc
cases 0 or 1mt
tests 0 or 1mv
vaccinations 0 or 1 (only ifdho
= 1)c
either an &-separated list of country names, using%20
for spaces, orall
for all-availablecsn
(subnational data) 0 or 1st
total sex (in these files we uset
instead ofb
0 or 1sf
females (in these files we uset
instead ofb
0 or 1sm
males (in these files we uset
instead ofb
0 or 1d1
date 1d2
date 2tsv
do you want atsv
file? 1 for yes, 0 for acsv
file.test_url <- "https://www.coverage-db.org/home/api/?dho=0&dh5=1&dh10=0&md=1&mc=1&mt=0&mv=0&c=all&csn=0&st=1&sf=1&sm=1&d1=2020-01-01&d2=2022-10-26&tsv=0" library(httr) GET(test_url, write_disk("download.zip"))
Sometime soon we'll write a wrapper function to do this, so that parameters can be requested in a function call, and we'll update the getting-started page as well as the
R
package. I suspect this using this approach for your work would be lighter weight moving forward, no?
Please kindly notice the age distribution for the South Korea cases doesn't look right. The case numbers cluster in the old age groups.