ropensci / openalexR

Getting bibliographic records from OpenAlex
https://docs.ropensci.org/openalexR/
Other
89 stars 19 forks source link

oa_fetch() works: where is "is_corresponding" field? #205

Closed yhan818 closed 5 months ago

yhan818 commented 5 months ago

I am using oa_fetch for entity "works" (see code below).

This is for author # Bekir Tanriover: https://openalex.org/authors/a5016874418 author_id <- "a5016874418"

author_works <- oa_fetch( entity = "works", author.id = "a5016874418", publication_year = 2023, verbose = TRUE )

The author_works objs have 36 columns. One is "author", which has "author_position", but I cannot find "is_corresponding" field (I have tested a few authors).

OpenAlex's documentation: https://docs.openalex.org/api-entities/works/work-object/authorship-object , which has "is_corresponding".

Did I miss anything?

yjunechoe commented 5 months ago

As they also say on the documentation, this looks like a new field that they recently added in. As a general rule of thumb, we need to manually update {openalexR} to handle each new field as they are introduced, and we simply haven't caught this. So you didn't miss anything - thanks for pointing us to it!

It looks like the is_corresponding field is still under development, but if we're confident that it'll stay Boolean type I guess we can go ahead and add support for this.

rkrug commented 5 months ago

Would it be possible, to add `output = "raw" which would save the json as returned without further processing, this would make it possible to implement these conversions for experimental changes in OpenAlex?

yjunechoe commented 5 months ago

Yes, there are two options.

  1. With oa_fetch(..., output = "list"), you get the JSON object as an R list. I'd probably start here if you want to explore new fields or contribute a PR for it.
  2. You can use oa_query() to get the query url and GET the JSON string directly with e.g., {httr}, but this doesn't handle some features we built into oa_fetch() like paging

Speaking more directly on output = "raw", the option in (2) would be closest to that but again if the query is large, the json comes in pieces, so even a "raw json" would need post-processing for something like output = "raw" to return a single json.

rkrug commented 5 months ago

True - I did't check, but if simply converts the json to a list, without any assumptions, then output = "list" should be fine. Thanks.

yjunechoe commented 5 months ago

Yep - it's just the output of jsonlite::fromJSON(), as is!

trangdata commented 5 months ago

Thanks @yhan818 — this is indeed a useful field. I have added an is_corresponding column in #208.

If you install that branch, you can do:

library(openalexR)
author_works <- oa_fetch(
  entity = "works",
  author.id = "a5016874418",
  publication_year = 2023,
  verbose = TRUE
)
#> Requesting url: https://api.openalex.org/works?filter=author.id%3Aa5016874418%2Cpublication_year%3A2023
#> Getting 1 page of results with a total of 9 records...

tail(author_works[[3]][[1]][,1:5])
#>                              au_id    au_display_name
#> 1 https://openalex.org/A5045990380  Mohanad Al-Obaidi
#> 2 https://openalex.org/A5011527972    Ahmet B. Gungor
#> 3 https://openalex.org/A5070604106      Sandra Kurtin
#> 4 https://openalex.org/A5027594045     Ann E. Mathias
#> 5 https://openalex.org/A5016874418    Bekir Tanrıöver
#> 6 https://openalex.org/A5076993071 Tirdad T. Zangeneh
#>                                au_orcid author_position is_corresponding
#> 1 https://orcid.org/0000-0002-5189-6594           first            FALSE
#> 2                                  <NA>          middle            FALSE
#> 3                                  <NA>          middle            FALSE
#> 4                                  <NA>          middle            FALSE
#> 5 https://orcid.org/0000-0002-2378-9302          middle            FALSE
#> 6 https://orcid.org/0000-0002-9035-2952            last             TRUE

Created on 2024-02-14 with reprex v2.0.2