sckott / habanero

client for Crossref search API
https://habanero.readthedocs.io
MIT License
207 stars 30 forks source link

UnboundLocalError in request_class.py #123

Closed vgreg closed 10 months ago

vgreg commented 1 year ago

Python 3.11 Habanero 1.2.3

I'm getting the following error on line 164 of request_class.py: UnboundLocalError: cannot access local variable 'r' where it is not associated with a value.

It seem that you can reach that line (check_json(r)) with r undefined if requests.get() raises a RequestException before returning. Because the exception is caught and printed, the code continues with r still undefined.

https://github.com/sckott/habanero/blob/5228483aa101214c4c945c72073d3c8b4d60101e/habanero/request_class.py#L143-L165

sckott commented 1 year ago

Thank you for the report @vgreg

Can you please share a reproducible example that caused this error? Seems like the except's aren't catching whatever error is thrown and then we don't have r defined. It'd be nice to have an example to figure out what that error is

vgreg commented 1 year ago

I am still looking for an example that will consistently reproduce the error. I was retrieving all articles for a set of about 150 journals and had the error occur for two journals, but I have been to re-run the request for both with no issue the second time.

Here is a simplified version of a request that failed once but has been working every other time:

from habanero import Crossref
cr = Crossref()
query = {"issn": "0028-3932"}

responses = cr.works(
    filter=query, cursor="*", cursor_max=12000
)

cursor_max is set to slightly more than the number of DOIs for the journal.

sckott commented 1 year ago

Thanks - I'll see if I can get that to fail

sckott commented 1 year ago

This may be difficult to track down - the facat that it doesn't happen consistently suggests it's an intermittent problem with the Crossref API

vgreg commented 1 year ago

I was able to reproduce a similar error and see what gets printed on line 163. Here is the exception:

HTTPSConnectionPool(host='api.crossref.org', port=443): Max retries exceeded with url:
/works?filter=issn%3A0028-3932&cursor=DnF1ZXJ5VGhlbkZldGNoBgAAAAAFuuH-Fmx3VDZUUHY5VHlhdThmaGVtbFhBOVEAAAAABcXycBZPY3FES3VMU1R5R3JIWHlwQUZBcktnAAAAAAXd460WTUpsaGN0RGFRbS1yN0ZYWTJ3MG5pUQAAAAAGAprKFlVTQUNpdVFEVHZLdWVZQWxVZEJDUUEAAAAABbl
xaxY0bldDU3pmSlJZeWhaSGk2VHVVdHh3AAAAAAW0yJAWaUpOMms5em5SUmVMR2JjT2VGdEFtdw%3D%3D (Caused by ConnectTimeoutError(<urllib3.connection.HTTPSConnection object at 0xffff33f68e90>, 'Connection to api.crossref.org timed out. (connect
timeout=None)'))

It seems that ConnectTimeout is derived from RequestException, so it is caught on line 162: https://requests.readthedocs.io/en/latest/api/#requests.ConnectionError

However, the code continues to line 164 with r still undefined.

sckott commented 1 year ago

Okay, thanks for this. I'll try to get to this soon

sdspieg commented 1 year ago

Having the same issue...

HTTPSConnectionPool(host='api.crossref.org', port=443): Max retries exceeded with url: /works?query=author%3AMONAGHAN+A%2BAND%2Btitle%3A%E2%80%98CALMLY+CRITICAL%E2%80%99%3A+EVOLVING+RUSSIAN+VIEWS+OF+US+HEGEMONY%2BAND%2Byear%3A2006%2BAND%2Bjournal%3AJOURNAL+OF+STRATEGIC+STUDIES&rows=1 (Caused by NewConnectionError('<urllib3.connection.HTTPSConnection object at 0x7f3fec6f7f70>: Failed to establish a new connection: [Errno 101] Network is unreachable'))
An error occurred: local variable 'r' referenced before assignment

Let me know if I can help with debugging (but the run continues despite the error...)

sckott commented 1 year ago

thanks for your report @sdspieg ! Sorry about the issue. I started working on this, but I just haven't had time to finish it off. I'll let you know if I could use any help.

sckott commented 1 year ago

@vgreg @sdspieg Can both of you reinstall from Github and try again?

sckott commented 10 months ago

closing for now, if it pops up again ping here