timriffe / covid_age

COVerAGE-DB: COVID-19 cases, deaths, and tests by age and sex
Other
56 stars 30 forks source link

Issue with South Korea cases #161

Closed liuyanguu closed 2 years ago

liuyanguu commented 2 years ago

Please kindly notice the age distribution for the South Korea cases doesn't look right. The case numbers cluster in the old age groups. image

library("covidAgeData")
library("data.table")
dt5_ori <- download_covid(data = "Output_5", temp = TRUE,
                          verbose = FALSE, progress = FALSE, return = "data.table")
# Only Deaths
dt5_ori[Country == "South Korea" & Sex == "b" & Date == "07.08.2022",]
timriffe commented 2 years ago

Thanks for reporting, will investigate asap!

timriffe commented 2 years ago

an update on this: there was a processing issue on our side between march and September 2022, and we've had to delete those dates (not yet reflected in OSF). Since we failed to capture snapshots of the original HTML tables we are going to try to request a special data extract from the data provider in South Korea. We do not know if this will work. The wayback machine only has ca 3 snapshots of the webpage during this period...

liuyanguu commented 2 years ago

May I have a quick question: (I understand people are gradually getting less interested in COVID now) How often do we plan to update the datasets on OSF? There seems to be no more update after August:

image

And do we plan to update the database forever or will it stop at some point...?

timriffe commented 2 years ago

Hi @liuyanguu We will now aim for monthly updates on OSF, but you can get more frequent updates from the new website which now has dates through Oct 26, 2022. However, OSF is the only place allowing a bulk download of the entire DB. Since you're only using national data, I think you could download it from the website, since that won't be too big. We're pretty close to figuring out how to make the request and download it from an R session, but I don't have a handy code chunk to share just yet. The issue with OSF upload frequency is that we're hitting the storage ceiling. We could of course create a new project or component on OSF and keep up weekly updates, but then the URLs to the files would change, so I'm we're not entirely sure how to move forward with OSF.

timriffe commented 2 years ago

@manalkamal can you chime in here regarding the initial issue on Korea data? We may be able to close this issue now.

timriffe commented 2 years ago

@liuyanguu I've located a piece of test code for downloading from the API. Here's an example that downloads deaths and cases in 5-year age groups for national populations only and from1 January 2020 until 26 October 2022. If you set the last date to lubridate::today() then that'll work too. The parameters are

Sometime soon we'll write a wrapper function to do this, so that parameters can be requested in a function call, and we'll update the getting-started page as well as the R package. I suspect this using this approach for your work would be lighter weight moving forward, no?

manalkamal commented 2 years ago

Hello @timriffe & @liuyanguu ,

Yes, the initial problem is resolved. the data are good now.

Thanks Liu for raising this issue.

liuyanguu commented 1 year ago

Sorry, I forgot to say your solution is very helpful and works well. Many thanks! And the problem is also solved. Many thanks, @manalkamal ! Happy new year!

@liuyanguu I've located a piece of test code for downloading from the API. Here's an example that downloads deaths and cases in 5-year age groups for national populations only and from1 January 2020 until 26 October 2022. If you set the last date to lubridate::today() then that'll work too. The parameters are

  • dho (original data) 0 or 1
  • dh5 (harmonized 5 year age groups) 0 or 1
  • dh10 (harmonized 10 year age groups) 0 or 1
  • md deaths 0 or 1
  • mc cases 0 or 1
  • mt tests 0 or 1
  • mv vaccinations 0 or 1 (only if dho = 1)
  • c either an &-separated list of country names, using %20 for spaces, or all for all-available
  • csn (subnational data) 0 or 1
  • st total sex (in these files we use t instead of b 0 or 1
  • sf females (in these files we use t instead of b 0 or 1
  • sm males (in these files we use t instead of b 0 or 1
  • d1 date 1
  • d2 date 2
  • tsv do you want a tsv file? 1 for yes, 0 for a csv file.
test_url <- "https://www.coverage-db.org/home/api/?dho=0&dh5=1&dh10=0&md=1&mc=1&mt=0&mv=0&c=all&csn=0&st=1&sf=1&sm=1&d1=2020-01-01&d2=2022-10-26&tsv=0"
library(httr)
GET(test_url, write_disk("download.zip"))

Sometime soon we'll write a wrapper function to do this, so that parameters can be requested in a function call, and we'll update the getting-started page as well as the R package. I suspect this using this approach for your work would be lighter weight moving forward, no?