reconhub / covid19hub

Community-driven COVID-19 analytics in R
58 stars 2 forks source link

Processing and accessing mobility data #2

Closed ffinger closed 4 years ago

ffinger commented 4 years ago

Google has started providing mobility statistics derived from mobility data from smartphone users: https://www.google.com/covid19/mobility/.

Description

For public health officials and researchers it will be crucial to be able to follow the evolution of those mobility indicators through time and to integrate them in automated analyses. A means to automatically access those indicators (and their evolution) from within R code is thus needed.

  1. See if google provides an API to automatically access indicators
    • If yes: go to 2.
    • If no, examine pdf reports to see if indicators can be extracted automatically
  2. provide an R package that makes the indicators available from within R in tabular form through API or by extracting values from pdf reports

Output

The proposed output is a data frame in long format with columns for country, date, indicator and values.

Impact

This will allow to follow mobility indicators automatically from within R code, which will allow for analysis of (for example) the impact of mobility reduction on transmissibility.

Proposed Timeline

First version of package available on Apr 10.

Focal Point

@ffinger

Links

Could be integrated into https://github.com/epiforecasts/NCoVUtils or live as a separate package.

PaulC91 commented 4 years ago

data extraction from pdf to csv with python: https://github.com/vitorbaptista/google-covid19-mobility-reports

noamross commented 4 years ago

These mobility data are generally far more useful as a time series, especially going backwards, so that modelers can calibrate their models to the change in mobility that occurred over the past month(s). So the task includes either

jennybc commented 4 years ago

If this becomes an Google API wrapping project, I might be of use. Unfortunately, it currently looks like a PDF parsing project, at least re: data ingest.

PaulC91 commented 4 years ago

@jennybc yeah I've heard there is 0% chance of google opening an API to this unfortunately. @noamross as far as I can tell the headline figure accompanying each graph is the latest figure on the time axis, so it would just be a case of extracting this every day to start to build a time-series.

noamross commented 4 years ago

I note there's an R workflow for parsing the PDF using pdftools, as well, but only for the headline figures: https://github.com/mattkerlogue/google-covid-mobility-scrape

PaulC91 commented 4 years ago

full extraction including trend lines in R! https://github.com/nacnudus/google-location-coronavirus

david-jankoski commented 4 years ago

+1 for this great repo and initiative! searching if google's api is public i stumbled upon this https://github.com/pastelsky/covid-19-mobility-tracker might be useful - though i see @PaulC91 has a nice working version of it in R 👏 awesome!

noamross commented 4 years ago

If someone could extend @nacnudus's repo/script/data above to include the county-level U.S. data it would be an enormous help.

nacnudus commented 4 years ago

See also this one by the UK Office for National Statistics. https://github.com/datasciencecampus/mobility-report-data-extractor

nacnudus commented 4 years ago

Data problems have been fixed :crossed_fingers: https://github.com/nacnudus/google-location-coronavirus

ffinger commented 4 years ago

The solution proposed by @nacnudus seems to solve the issue. Thanks a lot!

Remaining:

Anything else?

nacnudus commented 4 years ago

US counties now in the "region" file https://github.com/nacnudus/google-location-coronavirus/blob/master/2020-03-29-region.tsv. It hasn't been checked extensively though, so take care.

ffinger commented 4 years ago

Great, thanks a lot @nacnudus.

ffinger commented 4 years ago

@nacnudus, is your solution still working with the new mobility reports published yesterday? I haven't seen any movement on your repo. If so I will assume this issue to be solved.

nacnudus commented 4 years ago

@ffinger, it seems to have run fine on the new reports.

ffinger commented 4 years ago

Brilliant, will close the issue then.