Closed trangdata closed 10 months ago
Re #127: the user can now set the environment variable openalexR.print
to the number of characters in the printed query to shorten very long URLs:
library(openalexR)
w <- function() {
oa_fetch(
entity = "works",
title.search = c("bibliometric analysis", "science mapping"),
cited_by_count = ">50",
options = list(select = "id"),
from_publication_date = "2021-01-01",
to_publication_date = "2021-12-31",
verbose = TRUE
)
}
w0 <- w()
#> Requesting url: https://api.openalex.org/works?filter=title.search%3Abibliometric%20analysis%7Cscience%20mapping%2Ccited_by_count%3A%3E50%2Cfrom_publication_date%3A2021-01-01%2Cto_publication_date%3A2021-12-31&select=id
#> Getting 1 page of results with a total of 63 records...
Sys.setenv(openalexR.print = 70)
w1 <- w()
#> Requesting url: https://api.openalex.org/works?filter=title.search%3Abibliometric%20an...
#> Getting 1 page of results with a total of 63 records...
Sys.unsetenv("openalexR.print")
w2 <- w()
#> Requesting url: https://api.openalex.org/works?filter=title.search%3Abibliometric%20analysis%7Cscience%20mapping%2Ccited_by_count%3A%3E50%2Cfrom_publication_date%3A2021-01-01%2Cto_publication_date%3A2021-12-31&select=id
#> Getting 1 page of results with a total of 63 records...
Created on 2023-10-24 with reprex v2.0.2
Re #129: Previously, oa_snowball
can take a long time. This refactor removes the use of simple_rapply
and makes some improvement on speed.
Previously:
library(openalexR)
packageVersion("openalexR")
#> [1] '1.2.2'
w <- oa_fetch("works", options = list(sample = 20, seed = 1, select = "id"))
myids <- openalexR:::shorten_oaid(w$id)
system.time({
ilk_snowball <- oa_snowball(
identifier = myids,
verbose = TRUE
)
})
#> Requesting url: https://api.openalex.org/works?filter=openalex%3AW2752822653%7CW2057540892%7CW2071641039%7CW2528237503%7CW4255644834%7CW2039776320%7CW1998173837%7CW2894916677%7CW4205808956%7CW4292916519%7CW2210922255%7CW2123690481%7CW2074469351%7CW4378553964%7CW2321856033%7CW2439084087%7CW2294799430%7CW2966056779%7CW1424334985%7CW2425037722
#> Getting 1 page of results with a total of 20 records...
#> Collecting all documents citing the target papers...
#> Requesting url: https://api.openalex.org/works?filter=cites%3AW2071641039%7CW1998173837%7CW2057540892%7CW2321856033%7CW4205808956%7CW2210922255%7CW2294799430%7CW2425037722%7CW2123690481%7CW1424334985%7CW2039776320%7CW2074469351%7CW2439084087%7CW2528237503%7CW2752822653%7CW2894916677%7CW2966056779%7CW4255644834%7CW4292916519%7CW4378553964
#> Getting 2 pages of results with a total of 324 records...
#> Collecting all documents cited by the target papers...
#> Requesting url: https://api.openalex.org/works?filter=cited_by%3AW2071641039%7CW1998173837%7CW2057540892%7CW2321856033%7CW4205808956%7CW2210922255%7CW2294799430%7CW2425037722%7CW2123690481%7CW1424334985%7CW2039776320%7CW2074469351%7CW2439084087%7CW2528237503%7CW2752822653%7CW2894916677%7CW2966056779%7CW4255644834%7CW4292916519%7CW4378553964
#> Getting 1 page of results with a total of 135 records...
#> user system elapsed
#> 3.672 0.060 11.042
Now:
library(openalexR)
packageVersion("openalexR")
#> [1] '1.2.2.9999'
Sys.setenv(openalexR.print = 70)
w <- oa_fetch("works", options = list(sample = 20, seed = 1, select = "id"))
myids <- openalexR:::shorten_oaid(w$id)
system.time({
ilk_snowball <- oa_snowball(
identifier = myids,
verbose = TRUE
)
})
#> Requesting url: https://api.openalex.org/works?filter=openalex%3AW2752822653%7CW205754...
#> Getting 1 page of results with a total of 20 records...
#> Collecting all documents citing the target papers...
#> Requesting url: https://api.openalex.org/works?filter=cites%3AW2071641039%7CW199817383...
#> Getting 2 pages of results with a total of 324 records...
#> Collecting all documents cited by the target papers...
#> Requesting url: https://api.openalex.org/works?filter=cited_by%3AW2071641039%7CW199817...
#> Getting 1 page of results with a total of 135 records...
#> user system elapsed
#> 2.089 0.049 4.103
We can also make it a little faster by specifying the fields we want in oa_snowball
with options = list(select = c("id", "display_name", "authorships", "referenced_works"))
. Note that in the newest implementation, we allow different options
for the core papers, the citing papers and the cited_by papers. Therefore, one will need to specify these options
separately like so:
library(openalexR)
packageVersion("openalexR")
#> [1] '1.2.2.9999'
Sys.setenv(openalexR.print = 70)
w <- oa_fetch("works", options = list(sample = 20, seed = 1, select = "id"))
myids <- openalexR:::shorten_oaid(w$id)
my_opts <- list(select = c("id", "display_name", "authorships", "referenced_works"))
system.time({
ilk_snowball <- oa_snowball(
identifier = myids,
options = my_opts,
citing_params = list(options = my_opts),
cited_by_params = list(options = my_opts),
verbose = TRUE
)
})
#> Requesting url: https://api.openalex.org/works?filter=openalex%3AW2752822653%7CW205754...
#> Getting 1 page of results with a total of 20 records...
#> Collecting all documents citing the target papers...
#> Requesting url: https://api.openalex.org/works?filter=cites%3AW2071641039%7CW199817383...
#> Getting 2 pages of results with a total of 324 records...
#> Collecting all documents cited by the target papers...
#> Requesting url: https://api.openalex.org/works?filter=cited_by%3AW2071641039%7CW199817...
#> Getting 1 page of results with a total of 135 records...
#> user system elapsed
#> 0.898 0.016 2.075
Created on 2023-10-24 with reprex v2.0.2
The specification of the fields seems to make a huge difference. Great.
one test threw a warning
~Oh man did OpenAlex change its author IDs again? I'll check. All tests ran fine two days ago so I'm not sure why A2208157607 and A923435168 are no longer valid author ids.~ Hmm... so I think what happened is that I wasn't thorough enough in my update of author IDs in #167. Will update these IDs now.
This will need more extensive testing...
Some cleanup and optimization so far:
simple_rapply
options
as new argument tooa_snowball
Related: #129