Closed rkrug closed 11 months ago
OK - solved. The ?
should be an '&'
r$> q <- oa_query(search = "transformative change")
r$> oa_request(paste0(q), verbose = TRUE, count_only = TRUE) |> unlist()
count db_response_time_ms page per_page
7619126 1467 1 1
r$> oa_request(paste0(q, "&select=id"), verbose = TRUE, count_only = TRUE) |> unlist()
count db_response_time_ms page per_page
7619118 1531 1 1
Hey @rkrug, glad you figured it out! Another way to set select = "id"
is to plug it into options. Below, q and q2 are the same (if you use oa_query
). Or you can feed it directly in oa_fetch
:
library(openalexR)
q <- oa_query(search = "transformative change")
q2 <- oa_query(
search = "transformative change",
options = list(select = "id")
)
identical(paste0(q, "&select=id"), q2)
#> [1] TRUE
oa_request(q2, verbose = TRUE, count_only = TRUE)
#> $count
#> [1] 7619312
#>
#> $db_response_time_ms
#> [1] 1264
#>
#> $page
#> [1] 1
#>
#> $per_page
#> [1] 1
# or
oa_fetch(
search = "transformative change",
options = list(select = "id"),
verbose = TRUE,
count_only = TRUE
)
#> Requesting url: https://api.openalex.org/works?search=transformative%20change&select=id
#> count db_response_time_ms page per_page
#> [1,] 7619312 1264 1 1
Created on 2023-10-20 with reprex v2.0.2
Thanks @trangdata . Yes - that solved my issue, and also that I have to use the ampersand &
instead of the ?
.
Although I think the select
deserves a spot along the same line as search, as it is not an option as the others are?
But probably I am just biased...
Although I think the select deserves a spot along the same line as search, as it is not an option as the others are?
You're probably right. We implemented search
very early on, but it should probably be in options
as well. We will need a major refactor of the package for this. Other parameters that I think should go in options include per_page
and group_by(s)
. CC'ing @yjunechoe and @massimoaria if you have other thoughts!
I do not think it should be in option
- I rather think select
should be a top-level argument. options
should be be arguments, which are of secondary importance, e.g., as you say, per_page
or paging
etc. In other words: options
is for power users, and normal users do not have to go there.
@rkrug Then I think we need to think carefully about which of these parameters are "of secondary important".
Currently, I'm thinking all of the parameters should be equal and moved to options
(with the exceptions of filter
). Take this query for example: https://api.openalex.org/works?selector=id (just an example to see what all the parameters are), we get this:
<selector is not a valid parameter. Valid parameters are: cursor, filter, format, group_by, group-by, group_bys, group-bys, mailto, page, per_page, per-page, q, sample, seed, search, select, sort.>
So, which of these should be in options
?
Another factor to consider is the number of arguments in oa_fetch
. I'm not sure if there is a recommended style somewhere, but I personally don't want oa_fetch
to have too many arguments (the call would already be very long with lots of filters). But I could be convinced otherwise.
What about using the approach the grass package is taking. It has a similar problem, that it interfaces with grass (a GIS program) where each command has many different arguments.
So it is a two step process:
...
and parameters <- list(...)
which can than be parsed if the parameter are valid (the names of the arguments) and then processed or passed on to OpenAlex (see https://github.com/rsbivand/rgrass/blob/6611c3d304d91c3c7c918e72696b4bf1c2ce2904/R/xml1.R#L165 for their implementation).This is not messy, flexible, in the help pages, one can mention the most important and relevant parameters and how they can be used, it is future proof, as unk known parameter can simply (with a warning) passed on to OpenAlex, etc.
I think that would give the best of both worlds.
The parameters of oa_fetch
would therefore become:
entity = if (is.null(identifier)) NULL else id_type(shorten_oaid(identifier[[1]])),
...,
output = c("tibble", "dataframe", "list"),
abstract = TRUE,
endpoint = "https://api.openalex.org",
count_only = FALSE,
mailto = oa_email(),
api_key = oa_apikey(),
verbose = FALSE
and in the help page, mention the arguments which are hidden behind ...
.
And oa_query
could use the same approach, i.e. the handling of the ...
would be assigned to oa_query
.
@rkrug Unfortunately, the ellipses ...
are already reserved for different filter parameters. This was a design choice early on to simplify the levels of nesting; the rationale was that that the user would often use doi = ...
and similar identifiers in oa_fetch
, and making them wrap it inside a potential argument filter
would be a little too cumbersome.
https://github.com/ropensci/openalexR/blob/51340446b85290d3ec83564f4321099ec03031eb/R/oa_fetch.R#L62
Because of this, I chose to put other parameters like select
and sort
in options = list()
.
Makes sense.
You know the structures much better than I do - but shouldn't it be possible to use the ellipses for both? Are there any combinations which would lead to collisions? I do not think there are any keywords for the filter in this list of parameter
cursor, filter, format, group_by, group-by, group_bys, group-bys, mailto, page, per_page, per-page, q, sample, seed, search, select, sort
So all named arguments in the ellipsis which are in this list, will be interpreted as parameter, all others will be wrapped in the filter as it is now.
Theoretically, yes, we could do this. However, I don't know if it's best practice to try to combine different levels of parameters (filter and higher-level ones) into one ...
. I have tried to be fancy like this many times in the past and it never came out well...
I do think we need to revise how we're implementing these arguments to the query and reorganize re what should go in options
and what should be moved out. I will create a new issue for this.
Thus us definitely the best approach for this. Sent from my iPhoneOn 20 Oct 2023, at 18:31, Trang Le @.***> wrote: Theoretically, yes, we could do this. However, I don't know if it's best practice to try to combine different levels of parameters (filter and higher-level ones) into one .... I have tried to be fancy like this many times in the past and it never came out well... I do think we need to revise how we're implementing these arguments to the query and reorganize re what should go in options and what should be moved out. I will create a new issue for this.
—Reply to this email directly, view it on GitHub, or unsubscribe.You are receiving this because you were mentioned.Message ID: @.***>
I am trying to get only the OpenAlex ids of the works resulting from a search. As I sis not find this option in openalexR, I just pasted
?select=id
to the query (see https://docs.openalex.org/how-to-use-the-api/get-single-entities/select-fields).But something strange happens: the number of results changes:
Am I doing something wrong or misunderstand something, or do I have to get hold of OpenAlex?