Closed ffinger closed 4 years ago
data extraction from pdf to csv with python: https://github.com/vitorbaptista/google-covid19-mobility-reports
These mobility data are generally far more useful as a time series, especially going backwards, so that modelers can calibrate their models to the change in mobility that occurred over the past month(s). So the task includes either
If this becomes an Google API wrapping project, I might be of use. Unfortunately, it currently looks like a PDF parsing project, at least re: data ingest.
@jennybc yeah I've heard there is 0% chance of google opening an API to this unfortunately. @noamross as far as I can tell the headline figure accompanying each graph is the latest figure on the time axis, so it would just be a case of extracting this every day to start to build a time-series.
I note there's an R workflow for parsing the PDF using pdftools
, as well, but only for the headline figures: https://github.com/mattkerlogue/google-covid-mobility-scrape
full extraction including trend lines in R! https://github.com/nacnudus/google-location-coronavirus
+1 for this great repo and initiative! searching if google's api is public i stumbled upon this https://github.com/pastelsky/covid-19-mobility-tracker might be useful - though i see @PaulC91 has a nice working version of it in R 👏 awesome!
If someone could extend @nacnudus's repo/script/data above to include the county-level U.S. data it would be an enormous help.
See also this one by the UK Office for National Statistics. https://github.com/datasciencecampus/mobility-report-data-extractor
Data problems have been fixed :crossed_fingers: https://github.com/nacnudus/google-location-coronavirus
The solution proposed by @nacnudus seems to solve the issue. Thanks a lot!
Remaining:
Anything else?
US counties now in the "region" file https://github.com/nacnudus/google-location-coronavirus/blob/master/2020-03-29-region.tsv. It hasn't been checked extensively though, so take care.
Great, thanks a lot @nacnudus.
@nacnudus, is your solution still working with the new mobility reports published yesterday? I haven't seen any movement on your repo. If so I will assume this issue to be solved.
@ffinger, it seems to have run fine on the new reports.
Brilliant, will close the issue then.
Google has started providing mobility statistics derived from mobility data from smartphone users: https://www.google.com/covid19/mobility/.
Description
For public health officials and researchers it will be crucial to be able to follow the evolution of those mobility indicators through time and to integrate them in automated analyses. A means to automatically access those indicators (and their evolution) from within R code is thus needed.
Output
The proposed output is a data frame in long format with columns for country, date, indicator and values.
Impact
This will allow to follow mobility indicators automatically from within R code, which will allow for analysis of (for example) the impact of mobility reduction on transmissibility.
Proposed Timeline
First version of package available on Apr 10.
Focal Point
@ffinger
Links
Could be integrated into https://github.com/epiforecasts/NCoVUtils or live as a separate package.