rji-futures-lab / mo-covid-vaccine-data

A pipeline for collecting and archiving data describing Covid-19 vaccinations in Missouri.
MIT License
0 stars 0 forks source link

Collect URLs to archive #1

Open gordonje opened 3 years ago

gordonje commented 3 years ago

It appears as though the URLs for the CSV files we want to download vary from one user session to the next. For example, here is the URL county-level data:

https://results.mo.gov/vizql/t/COVID19/w/VaccinationsDashboard/v/Vaccinations/vudcsv/sessions/8F1775BDA25F490382E8A97C99DD6E6A-2:0/views/3135269236547538553_17356141584422777008?summary=true

We need to figure out how to make HTTP requests to either the dashboard homepage or the embedded Tableau workbook to get the URLs for the csv files.

The session ID and other relevant info might be stored in cookies.

Let's start with the county-level stats csv. Then make a list of the other csv files we could be archiving.

gordonje commented 3 years ago

Could be useful: https://stackoverflow.com/a/62106733

gordonje commented 3 years ago

Wasn't able to implement anything like the StackOverflow solution. The code from this attempt is saved in this gist.

However, I did find a package perfectly tailored to this problem. Worked out of the box, but I'm not sure if it will work when deployed on AWS Lambda (because it requires pandas).