Closed yjunechoe closed 5 days ago
I would suggest to make doi
an argument of the function to make clear that either the OA id or the doi can be used to do the snowball search. Now it is somehow hidden in the oa_snowball()
call.
It's as hidden in oa_snowball()
as it is in oa_fetch()
and everywhere else in {openalexR}
, since doi
is technically not an identifier for an OpenAlex object but a filter, like author.orcid
, ids.mag
, etc. Since it touches on other aspects of the package design, promoting doi
as its own argument might need to be discussed in a separate issue.
IMO I think this is just a documentation problem - might be worth revisiting https://github.com/ropensci/openalexR/issues/30 while we're at it, to make it clear that oa_fetch()
and oa_snowball()
share the ...
(namely, the query filters)
Technically correct. But looking from the user perspective, who knows DOIs, and is not really aware of the inner workings (and does not have to be) of OpenAlex (and also openalexR), to see a DOI as an argument in oa_snowball()
and oa_fetch()
(as you mention it - did not cross my mind - but it is the same) would make this much easier to use and to start using it. The help file can specify the shortcomings of using DOIs.
I think it's great that it didn't cross your mind for oa_fetch()
: it means the documentation was sufficient but we simply fell short on doing the same for oa_snowball()
!
I do see your point about the convenience of having doi
"right there" in the function signature to see, so I'll keep this open for a bit for others to comment, but do note that the ...
issue will still persist in oa_snowball()
anyways, since we want to keep the design of forwarding ...
into oa_fetch()
, which is the main workhorse function.
In other words, even if we promote doi
into its own argument, only oa_fetch()
will get that in its function signature, and oa_snowball()
will still "hide" it in the dots.
I think it's great that it didn't cross your mind for oa_fetch(): it means the documentation was sufficient but we simply fell short on doing that the oa_snowball()!
Actually, it means that I started with a search term, and the DOI stuff came later. But with a snowball search, one starts with list of papers, so in most cases one has a DOI. It took me quite some searching to learn how I can search based on DOIs.
-- Rainer M. Krug, PhD (Conservation Ecology, SUN), MSc (Conservation Biology, UCT), Dipl. Phys. (Germany)
Orcid ID: 0000-0002-7490-0066
Department of Evolutionary Biology and Environmental Studies University of Zürich Office Y19-M-72 Winterthurerstrasse 190 8075 Zürich Switzerland
Office: +41 (0)44 635 47 64 Cell: +41 (0)78 630 66 57 email: @. @.
PGP: 0x0F52F982
Thanks, I see the point about "one starts with the papers first". Though IMO the best practice is to convert to OpenAlex IDs first before snowballing (the way you've been doing it already!), since some papers might not exist / might not be discoverable from DOI in OpenAlex, - that should be an input sanitization issue you ideally resolve before you move forward with a snowball.
Anyways, I'm more concerned about what this means for the broader design. Here's my thought process.
Q: Leave oa_snowball()
aside for the moment - is the way users already search papers via doi
in oa_fetch()
not clear?
A:1 If it is clear, then no need to change anything about how oa_snowball()
operates. We just need to make it clear to users that oa_snowball()
work the same with oa_fetch()
in terms of search filters. We maintain a single mental model about filters, and avoid any confusion by e.g., having oa_snowball()
have an explicit doi argument, but oa_fetch()
not (or, on the other extreme, committing to all current and future functions powered by oa_fetch()
to redundantly specify a doi argument to meet that expectation)
A2: If it's not clear, we need to revisit the way we introduce ...
and query filters entirely, and change the function signature for oa_fetch()
as a last resort, as it introduces the most change. After oa_fetch()
business is settled, we then revisit whether oa_snowball()
, essentially a wrapper to oa_fetch()
, should continue to pass the ...
or take an explicit doi
argument there as well.
@rkrug I don't support promoting doi
as its own argument because other arguments like author.orcid
would need the same treatment. I hope the new oa_snowball
examples June added are clear enough.
A2: If it's not clear, we need to revisit the way we introduce ... and query filters entirely, and change the function signature for oa_fetch() as a last resort, as it introduces the most change. After oa_fetch() business is settled, we then revisit whether oa_snowball(), essentially a wrapper to oa_fetch(), should continue to pass the ... or take an explicit doi argument there as well.
I agree with this point, but we should also keep in mind that this would be a major refactoring that will likely introduce new bugs. We currently have filter = list(...)
as an alternative place for putting all the filters. However, I have really enjoyed putting the filters as direct arguments to oa_fetch
via the ellipses.
Let's go ahead and merge this PR and table this discussion for now. Refactoring will need to be in a different issue.
Thanks, I see the point about "one starts with the papers first". Though IMO the best practice is to convert to OpenAlex IDs first before snowballing (the way you've been doing it already!), since some papers might not exist / might not be discoverable from DOI in OpenAlex, and that should be an input sanitization issue you should resolve before you snowball.
Fully agreed. I would suggest to add a Vignette about snowballing and this issue - otherwise the help becomes to long.
Thanks both for the helpful discussion! Let's move what we discussed into a longer-form vignette in a separate issue/PR. I'll go ahead and merge this as it addresses #259 (a Q of whether doi search is possible at all)
Let's go ahead and merge this PR and table this discussion for now. Refactoring will need to be in a different issue.
Agreed. Not only refactoring, but also other points raised (e.g. httr2 transition). If you could please start the discussion I would be happy to contribute to it.
Ref: #259
(the package .Rd file got updated from running
document()
and reflects work of a prev PR)