ropensci / rb3

A bunch of downloaders and parsers for data delivered from B3
https://docs.ropensci.org/rb3/
Other
70 stars 28 forks source link

Scrape PDF for IBOVESPA historical composition #40

Closed wilsonfreitas closed 1 year ago

wilsonfreitas commented 2 years ago

This PDF

https://www.b3.com.br/data/files/48/56/93/D5/96E615107623A41592D828A8/SERIE-RETROATIVA-DO-IBOV-METODOLOGIA-VALIDA-A-PARTIR-09-2013.pdf

Has the historical IBOVESPA compositon from Jan/2003 to Jan/2014.

But this must be extracted from this PDF.

pdf_tables <- tabulizer::extract_tables(url_pdf, pages = seq(62, 77))

library(tidyverse)
library(janitor)

pdf_tables[[1]] %>%
  as_tibble(.name_repair = "unique") %>%
  row_to_names(1) %>%
  clean_names() %>%
  pivot_longer(tidyselect::contains("_20"),
    names_to = c("mes")
  )

This code seems to solve that

wilsonfreitas commented 2 years ago

Another alternative is: buy from UP2Data.