neherlab / covid19_scenarios

Models of COVID-19 outbreak trajectories and hospital demand
https://covid19-scenarios.org
MIT License
1.36k stars 354 forks source link

new data for Canada #502

Open nataliadgepi opened 4 years ago

nataliadgepi commented 4 years ago

Hi, I'm not sure where to put this, the gov't of Canada has just updated their database and made hospitalizations/critical care etc public for all covid positive cases: https://www150.statcan.gc.ca/t1/tbl1/en/tv.action?pid=1310076601

I was wondering if this would be useful to update your database.

Thank you! Natalia

🐛 Bug Report

How to reproduce

Steps to reproduce the issue:

  1. Open the application in a browser

😯 Current Behavior

🤔 Expected Behavior

💁 Possible Solution

🔦 Context

💻 Code Sample

🌍 Your Environment

Software Version(s)
Browser
Operating System

Related

-

nnoll commented 4 years ago

Looks like the Canadian government does provide an API for data downloads for developers, see here. Haven't been able to track down the specifics on how to just get this csv.

noleti commented 4 years ago

The API is a little bit difficult to work with. As data is provided per case (https://www150.statcan.gc.ca/n1/tbl/csv/14100287-eng.zip), the .csv gets very big for all cases (~1GB before .zipping). So we likely don't want to download and parse full data in our scripts. There is an API to query data for ranges of dates for 'vectors', but each patient seems to be an individual 'vector' in the DB, and I don't see how to predict the IDs of new patients etc.

I can likely write a parser for this, if we want to have such a parser that downloads the entire set each time. Is that the case?

nnoll commented 4 years ago

It wouldn't be ideal to download 1 GB every time we update & fetch data. The best case would be to find a source that has this further synthesized for us. I would say hold off for now and we can revisit in a few days.

On Sun, Apr 12, 2020 at 11:10 AM Nils Ole Tippenhauer < notifications@github.com> wrote:

The API is a little bit difficult to work with. As data is provided per case (https://www150.statcan.gc.ca/n1/tbl/csv/14100287-eng.zip), the .csv gets very big for all cases (~1GB before .zipping). So we likely don't want to download and parse full data in our scripts. There is an API to query data for ranges of dates for 'vectors', but each patient seems to be an individual 'vector' in the DB, and I don't see how to predict the IDs of new patients etc.

I can likely write a parser for this, if we want to have such a parser that downloads the entire set each time. Is that the case?

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/neherlab/covid19_scenarios/issues/502#issuecomment-612654972, or unsubscribe https://github.com/notifications/unsubscribe-auth/AHAVMGYI6Q4YAJDHLCVE2W3RMH7ZRANCNFSM4MGBDY2A .