ropensci / openalexR

Getting bibliographic records from OpenAlex
https://docs.ropensci.org/openalexR/
Other
89 stars 19 forks source link

Show search by DOI in `oa_snowball()` examples #260

Closed yjunechoe closed 5 days ago

yjunechoe commented 5 days ago

Ref: #259

(the package .Rd file got updated from running document() and reflects work of a prev PR)

rkrug commented 5 days ago

I would suggest to make doi an argument of the function to make clear that either the OA id or the doi can be used to do the snowball search. Now it is somehow hidden in the oa_snowball() call.

yjunechoe commented 5 days ago

It's as hidden in oa_snowball() as it is in oa_fetch() and everywhere else in {openalexR}, since doi is technically not an identifier for an OpenAlex object but a filter, like author.orcid, ids.mag, etc. Since it touches on other aspects of the package design, promoting doi as its own argument might need to be discussed in a separate issue.

IMO I think this is just a documentation problem - might be worth revisiting https://github.com/ropensci/openalexR/issues/30 while we're at it, to make it clear that oa_fetch() and oa_snowball() share the ... (namely, the query filters)

rkrug commented 5 days ago

Technically correct. But looking from the user perspective, who knows DOIs, and is not really aware of the inner workings (and does not have to be) of OpenAlex (and also openalexR), to see a DOI as an argument in oa_snowball() and oa_fetch() (as you mention it - did not cross my mind - but it is the same) would make this much easier to use and to start using it. The help file can specify the shortcomings of using DOIs.

yjunechoe commented 5 days ago

I think it's great that it didn't cross your mind for oa_fetch(): it means the documentation was sufficient but we simply fell short on doing the same for oa_snowball()!

I do see your point about the convenience of having doi "right there" in the function signature to see, so I'll keep this open for a bit for others to comment, but do note that the ... issue will still persist in oa_snowball() anyways, since we want to keep the design of forwarding ... into oa_fetch(), which is the main workhorse function.

In other words, even if we promote doi into its own argument, only oa_fetch() will get that in its function signature, and oa_snowball() will still "hide" it in the dots.

rkrug commented 5 days ago

I think it's great that it didn't cross your mind for oa_fetch(): it means the documentation was sufficient but we simply fell short on doing that the oa_snowball()!

Actually, it means that I started with a search term, and the DOI stuff came later. But with a snowball search, one starts with list of papers, so in most cases one has a DOI. It took me quite some searching to learn how I can search based on DOIs.

-- Rainer M. Krug, PhD (Conservation Ecology, SUN), MSc (Conservation Biology, UCT), Dipl. Phys. (Germany)

Orcid ID: 0000-0002-7490-0066

Department of Evolutionary Biology and Environmental Studies University of Zürich Office Y19-M-72 Winterthurerstrasse 190 8075 Zürich Switzerland

Office: +41 (0)44 635 47 64 Cell: +41 (0)78 630 66 57 email: @. @.

PGP: 0x0F52F982

yjunechoe commented 5 days ago

Thanks, I see the point about "one starts with the papers first". Though IMO the best practice is to convert to OpenAlex IDs first before snowballing (the way you've been doing it already!), since some papers might not exist / might not be discoverable from DOI in OpenAlex, - that should be an input sanitization issue you ideally resolve before you move forward with a snowball.

Anyways, I'm more concerned about what this means for the broader design. Here's my thought process.

Q: Leave oa_snowball() aside for the moment - is the way users already search papers via doi in oa_fetch() not clear?

A:1 If it is clear, then no need to change anything about how oa_snowball() operates. We just need to make it clear to users that oa_snowball() work the same with oa_fetch() in terms of search filters. We maintain a single mental model about filters, and avoid any confusion by e.g., having oa_snowball() have an explicit doi argument, but oa_fetch() not (or, on the other extreme, committing to all current and future functions powered by oa_fetch() to redundantly specify a doi argument to meet that expectation)

A2: If it's not clear, we need to revisit the way we introduce ... and query filters entirely, and change the function signature for oa_fetch() as a last resort, as it introduces the most change. After oa_fetch() business is settled, we then revisit whether oa_snowball(), essentially a wrapper to oa_fetch(), should continue to pass the ... or take an explicit doi argument there as well.

trangdata commented 5 days ago

@rkrug I don't support promoting doi as its own argument because other arguments like author.orcid would need the same treatment. I hope the new oa_snowball examples June added are clear enough.

A2: If it's not clear, we need to revisit the way we introduce ... and query filters entirely, and change the function signature for oa_fetch() as a last resort, as it introduces the most change. After oa_fetch() business is settled, we then revisit whether oa_snowball(), essentially a wrapper to oa_fetch(), should continue to pass the ... or take an explicit doi argument there as well.

I agree with this point, but we should also keep in mind that this would be a major refactoring that will likely introduce new bugs. We currently have filter = list(...) as an alternative place for putting all the filters. However, I have really enjoyed putting the filters as direct arguments to oa_fetch via the ellipses.

Let's go ahead and merge this PR and table this discussion for now. Refactoring will need to be in a different issue.

rkrug commented 5 days ago

Thanks, I see the point about "one starts with the papers first". Though IMO the best practice is to convert to OpenAlex IDs first before snowballing (the way you've been doing it already!), since some papers might not exist / might not be discoverable from DOI in OpenAlex, and that should be an input sanitization issue you should resolve before you snowball.

Fully agreed. I would suggest to add a Vignette about snowballing and this issue - otherwise the help becomes to long.

yjunechoe commented 5 days ago

Thanks both for the helpful discussion! Let's move what we discussed into a longer-form vignette in a separate issue/PR. I'll go ahead and merge this as it addresses #259 (a Q of whether doi search is possible at all)

rkrug commented 5 days ago

Let's go ahead and merge this PR and table this discussion for now. Refactoring will need to be in a different issue.

Agreed. Not only refactoring, but also other points raised (e.g. httr2 transition). If you could please start the discussion I would be happy to contribute to it.