Issue extracting data from Google's PDF

alexandrsanches commented 4 years ago

I'm trying to use the packge to extract data from Brazil's report using the code below: I'm using Windows 10 and R 3.6.3

#remotes::install_github("joachim-gassen/tidycovid19")

library(tidycovid19)
library(tidyverse)
library(pdftools)
library(png)

pdf_url <- "https://www.gstatic.com/covid19/mobility/2020-04-05_BR_Mobility_Report_en.pdf"
pdf_convert(pdf_url, pages = 1, filenames = "google_cmr_de_p1.png", verbose = FALSE)

bitmaps <- tidycovid19:::extract_line_graph_bitmaps(pdf_url, 1)
png_file <- tempfile("bitmap_", fileext = ".png")
writePNG(bitmaps[[1]][[1]], "bitmap.png")

df <- tidycovid19:::parse_line_graph_bitmap(bitmaps[[1]][[1]])

In the line "bitmaps <- tidycovid19:::extract_line_graph_bitmaps(pdf_url, 1)" it return the following error:

Error

joachim-gassen commented 4 years ago

Hi there: I apologize for my lack of Portuguese (?) but most likely this is because you have installed a newer version of the package that does no longer contain the PDF scraping code (Google has made its data available in CSV format now).

If you are interested in the PDF scraping code per se, you have to install an old version of the package that still contains the PDF code. In a fresh R session (with the package not being attached), run

remotes::install_github("joachim-gassen/tidycovid19", ref = "0990bc6")

and the code should run. Does it?

alexandrsanches commented 4 years ago

Thank you for answering so fast. It worked perfectly.

joachim-gassen / tidycovid19

Issue extracting data from Google's PDF #12