ropensci / openalexR

Getting bibliographic records from OpenAlex
https://docs.ropensci.org/openalexR/
Other
97 stars 21 forks source link

Refactor oa_fetch parameters #182

Open trangdata opened 11 months ago

trangdata commented 11 months ago

Moved from #179. Suggested by @rkrug.

Which of the following parameters to the OpenAlex query should be moved into/out of options?

format, group_by, group_bys, mailto, q, sample, seed, search, select, sort

Other special parameters that I think should be kept outside of options: filter, page, per_page, cursor (related to paging)

Something else to consider is the number of arguments in oa_fetch. I'm not sure if there is a recommended style somewhere, but I personally don't want oa_fetch to have too many arguments. But I could be convinced otherwise. Note that the the ellipses are already reserved for different filter parameters.

rkrug commented 11 months ago

Discussion point.

I think this discussion is linked to the question, how power user arguments should be handled (pages, cursor, per_page, ...) while at the same time keep the usage for the non-power user as simple (probably even simpler?) as it is.

The question of separating the arguments into power user arguments in oa_query() and oa_request() was raised in #183. All additions (like the useful paging control, or per-page saving) are power user features and, if left in oa_fetch() irritate normal users, even simply seeing them in the help file. IMHO oa_fetch() should cover 80-90% of all needs (which it does at the moment), and additional power user features can be added to oa_query() and oa_request(). A wrapper does not need to expose all parameters and features of the underlying functions.

Argument provision

I like the simplicity to supply OpenAlex parameter using the ellipsis ..., but see that this contradicts the current implementation of using ... for filter elements. This can be a problem, but does not need to be. I do not think there are any collisions between having filter options and OpenAlex parameter in .... One could "expose" some arguments by specifying them directly in oa_fetch() for the non-power user, and forward them to oa_query() and oa_request().

yjunechoe commented 11 months ago

Assuming that we should move some parameters out, and those parameters are a significant chunk, one could imagine an interface where we move "power user" related options into a single argument, where the argument can take a constructor function that specifies those power user settings in bulk. This is an alternative to passing everything down ... and disentangling the dots internally - this is very error prone and we interrupt promises by intercepting and reading the contents of the dots, which can cause unexpected problems down the line.

For what it's worth, the construction function design is how things are usually implemented in the modelling world. For example, lme4::lmer fits a linear mixed effects model using a small set of options relevant for general usecases:

args(lme4::lmer)
#> function (formula, data = NULL, REML = TRUE, control = lmerControl(), 
#>     start = NULL, verbose = 0L, subset, weights, na.action, offset, 
#>     contrasts = NULL, devFunOnly = FALSE) 

But crucially, it also has a control argument that receives a specification object lmerControl(), which stores all "power user" features:

args(lme4::lmerControl)
#> function (optimizer = "nloptwrap", restart_edge = TRUE, boundary.tol = 1e-05, 
#>     calc.derivs = TRUE, use.last.params = FALSE, sparseX = FALSE, 
#>     standardize.X = FALSE, check.nobs.vs.rankZ = "ignore", check.nobs.vs.nlev = "stop", 
#>     check.nlev.gtreq.5 = "ignore", check.nlev.gtr.1 = "stop", 
#>     check.nobs.vs.nRE = "stop", check.rankX = c("message+drop.cols", 
#>         "silent.drop.cols", "warn+drop.cols", "stop.deficient", 
#>         "ignore"), check.scaleX = c("warning", "stop", "silent.rescale", 
#>         "message+rescale", "warn+rescale", "ignore"), check.formula.LHS = "stop", 
#>     check.conv.grad = .makeCC("warning", tol = 0.002, relTol = NULL), 
#>     check.conv.singular = .makeCC(action = "message", tol = formals(isSingular)$tol), 
#>     check.conv.hess = .makeCC(action = "warning", tol = 1e-06), 
#>     optCtrl = list(), mod.type = "lmer") 
rkrug commented 11 months ago

Yes - and these can be saved in using options(). one could also provide the same for filters, so that a set of complex filters can be defined and re-used easily.

yjunechoe commented 11 months ago

Yep! Though admittedly, I'm not myself a power user of openalexR 😅 so I don't have those kind of good intuitions about usability (which makes your expertise extra appreciated!)

I'll let others get a chance to chime in (re: what should be kept in/moved out) and come back for the implementation part 👀

rkrug commented 11 months ago

I can definitely look at implementation, as I have used similar approaches in other projects, but we should first agree on going in this direction, and which parameters should be dealt with first.

yjunechoe commented 11 months ago

I'm personally a little cautious of exposing a feature that lets users inject code in the middle of a function executing queries. setHooks()/getHooks() also requires a persistent package state that can store closures, which feels too bulky to maintain for a light API wrapper package like openalexR.

For what it's worth, a power user could implement a hook-like behavior from the outside by trace()-ing the function to inject code. Or, paging can be more precisely controlled with something like {chromote}, which does have that kind of hook ("callback") argument that you're describing. Generally, I think it'd be nice to know how many users would benefit from such a power user feature given the availability of more flexible alternatives (of writing their own version of oa_request() building off of oa_query(), or even just writing http/curl directly) before we push further in this direction (so I could be convinced otherwise!).

trangdata commented 11 months ago

Hi @rkrug, for clarity and tractability, could we keep the issue focused on the initial point raised and open a new issue if it's a different topic? 🙏🏽 Thank you! I went ahead and responded to your point in #185.

rkrug commented 11 months ago

@trangdata Agreed. I will delete the comment here.