ropensci / openalexR

Getting bibliographic records from OpenAlex
https://docs.ropensci.org/openalexR/
Other
89 stars 19 forks source link

oa_fetch works by author.id return "lexical error: invalid char in json text." #204

Closed amacanovic closed 5 months ago

amacanovic commented 5 months ago

Hi all,

I've run into what seems to be a parsing problem when fetching works for some author IDs using "oa_fetch".

Here is an example that returns an error:

prof_all_works_oa <- oa_fetch(
     entity = "works", 
     author.id = "A5022158724")

I get the following error:

Error: lexical error: invalid char in json text.
                                       <?xml version="1.0" encoding="u
                     (right here) ------^

I've tried setting output as output = "list" as recommended in other tickets, but this did not solve the issue. Does anyone have suggestions on how to tackle this?

Thank you!

trangdata commented 5 months ago

What an intriguing error. Thank you for reporting @amacanovic. Somehow the default per_page = 200 returns a 503: Response object too large

https://api.openalex.org/works?filter=author.id%3AA5022158724&per-page=200

I'll investigate, but for now, you can set a small number for per_page:

library(openalexR)
prof_all_works_oa <- oa_fetch(
  entity = "works", 
  author.id = "A5022158724",
  per_page = 100,
  verbose = TRUE
)
#> Requesting url: https://api.openalex.org/works?filter=author.id%3AA5022158724
#> Getting 3 pages of results with a total of 201 records...
prof_all_works_oa
#> # A tibble: 201 × 36
#>    id                     display_name author ab    publication_date so    so_id
#>    <chr>                  <chr>        <list> <chr> <chr>            <chr> <chr>
#>  1 https://openalex.org/… Subtyping o… <df>   Immu… 2010-05-25       PLOS… http…
#>  2 https://openalex.org/… Association… <df>   Prev… 2010-12-29       JNCI… http…
#>  3 https://openalex.org/… A common co… <df>   <NA>  2007-02-11       Natu… http…
#>  4 https://openalex.org/… Newly disco… <df>   Doug… 2009-03-29       Natu… http…
#>  5 https://openalex.org/… Prediction … <df>   Data… 2015-04-02       JNCI… http…
#>  6 https://openalex.org/… The 70-gene… <df>   The … 2008-07-27       Brea… http…
#>  7 https://openalex.org/… Somatic mut… <df>   Muta… 2011-11-23       Huma… http…
#>  8 https://openalex.org/… Functional … <df>   Anal… 2013-04-01       The … http…
#>  9 https://openalex.org/… Age- and Tu… <df>   CHEK… 2016-08-10       Jour… http…
#> 10 https://openalex.org/… Gene–enviro… <df>   Here… 2016-01-01       Brit… http…
#> # ℹ 191 more rows
#> # ℹ 29 more variables: host_organization <chr>, issn_l <chr>, url <chr>,
#> #   pdf_url <chr>, license <chr>, version <chr>, first_page <chr>,
#> #   last_page <chr>, volume <chr>, issue <chr>, is_oa <lgl>,
#> #   is_oa_anywhere <lgl>, oa_status <chr>, oa_url <chr>,
#> #   any_repository_has_fulltext <lgl>, language <chr>, grants <list>,
#> #   cited_by_count <int>, counts_by_year <list>, publication_year <int>, …

Created on 2024-02-11 with reprex v2.0.2

rkrug commented 5 months ago

I would suggest to report this to OpenAlex as I think they mention 200 as the upper limit (https://docs.openalex.org/how-to-use-the-api/get-lists-of-entities/paging) So I expect it to work from their side.