ropensci / openalexR

Getting bibliographic records from OpenAlex
https://docs.ropensci.org/openalexR/
Other
91 stars 20 forks source link

Issue with oa_fetch function when converting to dataframe/tibble #154

Closed qplantec closed 1 year ago

qplantec commented 1 year ago

Dear OpenAlexR team,

I am facing an issue when extracting OpenAlex data with the oa_fetch function. When querying the database, the "download" process seems to work well. However, the "converting" process no longer works well, and I received an error message.

It works correctly when using output = "list". However, it fails when using "tibble" or "dataframe" as output. I guess it might be due to an update of the data format (or an issue with the dependencies).

For example, if I am doing a query like this one:

works_from_dois <- oa_fetch(
+       entity = "works",
+       best_oa_location.source.id = source,
+       verbose = TRUE)

I get the following response:

Requesting url: https://api.openalex.org/works?filter=best_oa_location.source.id%3AS111155417
Getting 27 pages of results with a total of 5369 records...
  OpenAlex downloading [=====================] 100% eta:  0s
  converting [-------------------------------]   1% eta: 48sError in (function (..., deparse.level = 1, make.row.names = TRUE, stringsAsFactors = FALSE,  : 
 Incorrect argument list: all variables must have the same length.
trangdata commented 1 year ago

Thank you so much @qplantec for using the package and for reporting this. 🌈 You're right: the new field lineage under institutions caused issues when we try to rbind them. I submitted a fix. Hope to merge soon! 🤞🏽

With this fix, the following call works:

library(openalexR)
works_from_dois <- oa_fetch(
  entity = "works",
  best_oa_location.source.id = "S111155417",
  cited_by_count = ">200",
  verbose = TRUE
)
#> Requesting url: https://api.openalex.org/works?filter=best_oa_location.source.id%3AS111155417%2Ccited_by_count%3A%3E200
#> Getting 2 pages of results with a total of 244 records...
works_from_dois
#> # A tibble: 244 × 36
#>    id                     display_name author ab    publication_date so    so_id
#>    <chr>                  <chr>        <list> <chr> <chr>            <chr> <chr>
#>  1 https://openalex.org/… Nanostructu… <df>   "Nan… 2013-06-13       Jour… http…
#>  2 https://openalex.org/… Tuning the … <df>   "We … 2015-08-06       Jour… http…
#>  3 https://openalex.org/… Cobalt–Iron… <df>   "Cob… 2015-03-04       Jour… http…
#>  4 https://openalex.org/… Simple Chem… <df>   "Lig… 2009-01-21       Jour… http…
#>  5 https://openalex.org/… Efficient V… <df>   "Eve… 2015-05-03       Jour… http…
#>  6 https://openalex.org/… Lanthanide … <df>   "ADV… 1979-01-01       Jour… http…
#>  7 https://openalex.org/… Transformat… <df>   "Hum… 2015-01-27       Jour… http…
#>  8 https://openalex.org/… Contributio… <df>   "Mol… 2016-06-21       Jour… http…
#>  9 https://openalex.org/… Origin of t… <df>   "The… 2014-03-11       Jour… http…
#> 10 https://openalex.org/… High Electr… <df>   "Hyd… 2016-12-06       Jour… http…
#> # ℹ 234 more rows
#> # ℹ 29 more variables: host_organization <chr>, issn_l <chr>, url <chr>,
#> #   pdf_url <chr>, license <chr>, version <chr>, first_page <chr>,
#> #   last_page <chr>, volume <chr>, issue <chr>, is_oa <lgl>,
#> #   is_oa_anywhere <lgl>, oa_status <chr>, oa_url <chr>,
#> #   any_repository_has_fulltext <lgl>, language <chr>, grants <list>,
#> #   cited_by_count <int>, counts_by_year <list>, publication_year <int>, …

Created on 2023-08-28 with reprex v2.0.2

qplantec commented 1 year ago

Thank you so much for your quick answer and for quickly fixing the issue :)! Do you know when I will be able to load the package again on R to have the version with the bug fixed? Many thanks again for your work on this! Best

trangdata commented 1 year ago

@qplantec Pending my collaborator's approval, but for now could you try

remotes::install_github("ropensci/openalexR@institution-lineage")
qplantec commented 1 year ago

It works perfectly, thank you so much :)

andtheWings commented 1 year ago

Thank you, I was also having this issue and installing your bug fix from Github fixed it!