ropensci / openalexR

Getting bibliographic records from OpenAlex
https://docs.ropensci.org/openalexR/
Other
91 stars 20 forks source link

works entity - count discrepancy #151

Closed hp0404 closed 1 year ago

hp0404 commented 1 year ago

I'm trying to get publications from this URL: https://api.openalex.org/works?filter=primary_location.source.id:S157188123

When you open this page in the browser, you see that the meta > count field has a 1379 value, but when I run this query using openalexR (from google colab), it says: Getting 1215574 pages of results with a total of 243114633 records...

What am I doing wrong?

Code:

install.packages("openalexR")
library(openalexR)

query <- oa_query(
  entity = "works",
  filter = "primary_location.source.id:S157188123",
  endpoint = "https://api.openalex.org",
)

res <- oa_request(
  query_url = query,
  count_only = FALSE,
  verbose = TRUE,
  mailto = "myemail@gmail.com"
)

df <- oa2df(res, entity = "works")

M <- oa2bibliometrix(df)
trangdata commented 1 year ago

Hi @hp0404 thank you for using the package!

A slight tweak to make in your oa_query call: filter = list(primary_location.source.id = "S157188123"),

Alternatively, you can do this all in one step:

library(openalexR)

w <- oa_fetch(
  entity = "works",
  primary_location.source.id = "S157188123",
  verbose = TRUE,
  mailto = "myemail@gmail.com",
)
#> Requesting url: https://api.openalex.org/works?filter=primary_location.source.id%3AS157188123
#> Getting 7 pages of results with a total of 1379 records...
w
#> # A tibble: 1,379 × 36
#>    id                     display_name author ab    publication_date so    so_id
#>    <chr>                  <chr>        <list> <chr> <chr>            <chr> <chr>
#>  1 https://openalex.org/… Russian Ful… <df>   "Thi… 2015-01-02       Jour… http…
#>  2 https://openalex.org/… Russia's Re… <df>   "Ref… 2004-06-01       Jour… http…
#>  3 https://openalex.org/… NATO's Vise… <df>   "At … 2000-06-01       Jour… http…
#>  4 https://openalex.org/… The ‘KGB St… <df>   "Thi… 2017-04-03       Jour… http…
#>  5 https://openalex.org/… The campaig… <df>   "The… 2001-09-01       Jour… http…
#>  6 https://openalex.org/… The militar… <df>   ""    1994-09-01       Jour… http…
#>  7 https://openalex.org/… Soviet poli… <df>   "The… 1999-12-01       Jour… http…
#>  8 https://openalex.org/… The KGB and… <df>   "The… 2016-01-02       Jour… http…
#>  9 https://openalex.org/… The Gray Wa… <df>   "The… 2018-07-03       Jour… http…
#> 10 https://openalex.org/… The Politic… <df>   "The… 2014-07-03       Jour… http…
#> # ℹ 1,369 more rows
#> # ℹ 29 more variables: host_organization <chr>, issn_l <chr>, url <chr>,
#> #   pdf_url <chr>, license <chr>, version <chr>, first_page <chr>,
#> #   last_page <chr>, volume <chr>, issue <chr>, is_oa <lgl>,
#> #   is_oa_anywhere <lgl>, oa_status <chr>, oa_url <chr>,
#> #   any_repository_has_fulltext <lgl>, language <chr>, grants <list>,
#> #   cited_by_count <int>, counts_by_year <list>, publication_year <int>, …

Created on 2023-08-22 with reprex v2.0.2

hp0404 commented 1 year ago

thanks!