ropensci / openalexR

Getting bibliographic records from OpenAlex
https://docs.ropensci.org/openalexR/
Other
89 stars 19 forks source link

On GitHub version, `works2df()` does drop title #218

Closed rkrug closed 3 months ago

rkrug commented 3 months ago

I am doing the following request with the select:

 oa_query(
                title_and_abstract.search = compact(search),
                publication_year = y,
                options = list(
                    select = c("id", "doi", "authorships", "publication_year", "title", "abstract_inverted_index", "topics")
                ),
                verbose = FALSE
            ) |>
                IPBES.R::oa_request_IPBES(
                    count_only = FALSE,
                    output_path = output_path,
                    verbose = TRUE
                )

and I am getting the following result as you can download here https://www.dropbox.com/scl/fi/9mr5yw1h8sdrd78ya9llt/page_1.rds?rlkey=2ndmv223avd0gtxslicfmf2t7&dl=0

Now when using

p <- readRDS('page_1.rds')

p[[2]][[1]] |> names()

openalexR::works2df(p$results, verbose = FALSE)

I get the following tibble without title:

# A tibble: 17 × 6
   id                               author ab    publication_year doi   topics  
   <chr>                            <list> <chr>            <int> <chr> <list>  
 1 https://openalex.org/W2605327467 <df>   NA                1998 NA    <tibble>
 2 https://openalex.org/W2003516363 <df>   Abst…             1998 http… <tibble>
 3 https://openalex.org/W2019185270 <df>   Basi…             1998 http… <tibble>
 4 https://openalex.org/W2167846710 <df>   Prev…             1998 http… <tibble>
 5 https://openalex.org/W2045014930 <df>   Diff…             1998 http… <tibble>
 6 https://openalex.org/W1544940298 <df>   Stee…             1998 http… <tibble>
 7 https://openalex.org/W2013710807 <df>   The …             1998 http… <tibble>
 8 https://openalex.org/W2302413636 <df>   The …             1998 NA    <tibble>
 9 https://openalex.org/W4214626304 <lgl>  Cons…             1998 http… <tibble>
10 https://openalex.org/W1515552660 <df>   Abst…             1998 http… <tibble>
11 https://openalex.org/W1532831267 <df>   Abst…             1998 NA    <tibble>
12 https://openalex.org/W2129272042 <df>   Iden…             1998 NA    <tibble>
13 https://openalex.org/W2741805792 <df>   Rese…             1998 NA    <tibble>
14 https://openalex.org/W2054587757 <df>   Urba…             1998 http… <tibble>
15 https://openalex.org/W4240234799 <lgl>  Fish…             1998 http… <tibble>
16 https://openalex.org/W2788998128 <df>   The …             1998 NA    <tibble>
17 https://openalex.org/W4205102822 <lgl>  Cons…             1998 http… <tibble>
yjunechoe commented 3 months ago

In works2df, the title column is actually derived from the display_name field. So I believe if you changed the select option to this it'd work:

select = c("id", "doi", "authorships", "publication_year", "display_name", "abstract_inverted_index", "topics")

Given that they're identical, I suppose it'd be sensible to use the other when one of them is not available:

image

rkrug commented 3 months ago

That is exactly want I did.

I agree with your point about use title if display_name is not there and vice versa.

yjunechoe commented 3 months ago

Note to self - minimal reprex to debug:

oa_fetch(
  identifier = "W2605327467",
  options = list(select = "title")
)
#> data frame with 0 columns and 0 rows