sckott / habanero

client for Crossref search API
https://habanero.readthedocs.io
MIT License
207 stars 30 forks source link

works: Warning for one bad id pollutes entire response #112

Closed holub008 closed 1 year ago

holub008 commented 1 year ago

Python version: 3.10.9 habanero version: 1.2.2

Problem

My application fetches records by user-input DOIs, which sometimes are mistyped & not a valid DOI. Per the docs, warn=True is the solution to ignore invalid DOIs and other error conditions. As expected, an invalid DOI will return a None result; however, every id after it in the input list will also be None (even if valid). Is this expected behavior?

https://github.com/sckott/habanero/blob/183af9e8b8bbae72a9d810bcc674667e491f00b8/habanero/request.py#L158-L159

Example

from habanero import Crossref
cr = Crossref(mailto="...", ua_string="...")

bad_doi = '10.1200/jco.2021.39.15-suppl.e12560'
good_dois = ['10.1200/jco.2022.40.6_suppl.306', '10.1016/s1368-8375(21)00377-8', '10.1177/01945998211030908d', '10.1097/ju.0000000000000857.09']

responses = cr.works(ids=good_dois, warn=True)
len([r for r in responses if r])
# 4

responses = cr.works(ids=[bad_doi] + good_dois, warn=True)
len([r for r in responses if r])
# 0

My thoughts

This behavior seems unideal for all use cases. Either a caller wants an invalid DOI to fail the entire set of requests (which warn=False already handles), or the caller wants to salvage as much data as possible, ignoring invalid results. By returning Nones with positional dependency, neither of these use cases are satisfied.

My expectation/preference would be for warn=True to return None for any id resulting in a non-200 query, and the standard response for any 200 query.

sckott commented 1 year ago

Thanks @holub008 for the issue.

I'll have a look and get back to you soon

holub008 commented 1 year ago

Let me know if you need a hand making the code changes. From my read, it looks like the should_warn variable can be dropped entirely and conditionals consolidated. But I might be ignoring some intended functionality.

sckott commented 1 year ago

Here's where this thing came from https://github.com/sckott/habanero/issues/69 It doesn't have the use case described unfortunately b/c we chatted about it beforehand in another venue.

I've been experimenting and my first thought is what I've done on the warn-fix branch here is a good fix. I updated the tests that use warn=True to make sure using that arg works as it should have.

Thoughts?

holub008 commented 1 year ago

Nice! That change is more or less what I would have. Always appreciate your fast responses.

sckott commented 1 year ago

Great. Did you test it to make sure it works for you yet?

I think this now supports use cases:

holub008 commented 1 year ago

Just pulled warn-fix and ran my example:

>>> from habanero import Crossref
>>> cr = Crossref(mailto="...", ua_string="...")
>>> 
>>> bad_doi = '10.1200/jco.2021.39.15-suppl.e12560'
>>> good_dois = ['10.1200/jco.2022.40.6_suppl.306', '10.1016/s1368-8375(21)00377-8', '10.1177/01945998211030908d', '10.1097/ju.0000000000000857.09']
>>> 
>>> responses = cr.works(ids=good_dois, warn=True)
>>> len([r for r in responses if r])
4
>>> responses = cr.works(ids=[bad_doi] + good_dois, warn=True)
/Users/kholub/habanero/habanero/request.py:135: UserWarning: 404 on 10.1200/jco.2021.39.15-suppl.e12560: Not Found
  warnings.warn(mssg)
len([r for r in responses if r])
>>> len([r for r in responses if r])
4

That's the behavior I'd expect!

sckott commented 1 year ago

Great, i'll get this merged, and pushed to pypi soon

sckott commented 1 year ago

released a new version on pypi

holub008 commented 1 year ago

Confirmed on my target application. Thanks again!

sckott commented 1 year ago

great