Closed raffaem closed 2 years ago
Do you have a possibility to test this with a different Python version? I don't have Python3.10 at hands right now.
do you think it's due to the Python version?
I think it's due to KEYS
being empty.
The script had run for some time and it probably exhausted all the keys?
But it should give a different error.
Anyway "index out of range" doesn't look specific to Python 3.10.
I don't have another Python version less than that at the moment.
Yes Py3.10 might be the reason because this package is not (yet) released for Py3.10: https://img.shields.io/pypi/pyversions/pybliometrics.svg
It might be that your download broke. And besides, what can work in Python3.7 might not in Python3.10. Some things do change from version to version.
Uhm it's happening again.
Why in get_content
function of utils.py
, at line 77 there is a catch for IndexError
:
try:
KEYS.pop(0) # Remove current key
shuffle(KEYS)
header['X-ELS-APIKey'] = KEYS[0].strip()
resp = requests.get(url, headers=header, proxies=proxies,
params=params)
except IndexError: # All keys depleted
break
while at line 58 it just assumes that KEY
is non-empty:
header = {'X-ELS-APIKey': KEYS[0],
'Accept': 'application/json',
'User-Agent': user_agent}
?
I will try ith 3.9, although I would have avoided maintaining different Python versions on the same system
Michael, here is the code to reproduce the bug:
import requests
import io
from pybliometrics.scopus.exception import Scopus429Error
def myget(s, headers, proxies, params):
print("Called myget!")
resp = requests.models.Response()
resp.raw = io.BytesIO("{'service-error':{'status':{'statusText': 'boh'}}}".encode("utf8"))
resp.encoding = "utf-8"
resp.status_code = 429
return resp
requests.get = myget
from pybliometrics.scopus import AuthorRetrieval
for i in range(2):
try:
auth = AuthorRetrieval("1234")
except Scopus429Error:
print("Detected 429 error. Continue")
continue
Actually that's how it's supposed to work. resp = requests.get(url, headers=header, proxies=proxies, params=params)
raises an IndexError
when all keys are depleted, which subsequently breaks the while-loop. Then in https://github.com/pybliometrics-dev/pybliometrics/blob/master/pybliometrics/scopus/utils/get_content.py#L89 it raises the Scopus429Error. That's intentional, because it tells the user (you) that all keys are depleted.
The only problem I see is your very code snippet, where you only got the IndexError. In your second code snippet (same message) everything is fine. The only explanation I have for this is that your keys were empty to begin with. Otherwise the traceback would include this line https://github.com/pybliometrics-dev/pybliometrics/blob/master/pybliometrics/scopus/utils/get_content.py#L81, which it doesn't.
Actually that's how it's supposed to work.
resp = requests.get(url, headers=header, proxies=proxies, params=params)
raises anIndexError
when all keys are depleted,
I was under the impression that requests.get
just return a response code of 429 when all keys are depleted, without raising any exception
which subsequently breaks the while-loop. Then in https://github.com/pybliometrics-dev/pybliometrics/blob/master/pybliometrics/scopus/utils/get_content.py#L89 it raises the Scopus429Error. That's intentional, because it tells the user (you) that all keys are depleted.
I understand that
The only problem I see is your very code snippet, where you only got the IndexError.
I get the Scopus429Error the first time, and the IndexError the second time.
The problem is that, when the last key is depleted, this line raises the IndexError, which gets caught here and re-raised here as Scopus429Error.
But if at this point you call pybliometrics again, this will expect some elements in KEYS
, which aresn't there since all of them have been popped out
Ah-ha! So when there are no keys left, and one continues to make calls, then the IndexError
kicks in.
If this is a correct, than a Scopus429Error
would be more informative. However, the result is the same: People have to restart pybliometrics.
Ah-ha! So when there are no keys left, and one continues to make calls, then the
IndexError
kicks in.
Yes, that was what I was trying to say with this code
If this is a correct, than a
Scopus429Error
would be more informative.
I thought so and proposed a PR to implement that
However, the result is the same: People have to restart pybliometrics.
First, they would have to get new API keys. Then, they would have to restart pybliometrics.
Right?
Or do the API keys get "repleted" after a reasonable amount of time?
I think I had read somewhere they would get repleted after one week, if I'm not mistaken.
I was planning to code a scraper that would get new API keys automatically.
But if it is sufficient to just wait a reasonable amount of time and then restart pybliometrics, that won't be necessary.
Then I totally misunderstood you.
But xes, keys reset one week after first usage: https://pybliometrics.readthedocs.io/en/stable/access.html#api-key-quotas-and-429-error
In practice, one rarely hits the quota because with 10 keys you get 200k calls in the Scopus Search API per week.
If this is a correct, than a Scopus429Error would be more informative. However, the result is the same: People have to restart pybliometrics.
Should I reopen the PR that implements this, if you agree?
Well, if you still need this "problem" solved, can you come up with a fix based on try-except? An if-statement will affect everybody negatively in terms of computational burden, when in reality it affects just 1% of users
Well, if you still need this "problem" solved, can you come up with a fix based on try-except? An if-statement will affect everybody negatively in terms of computational burden, when in reality it affects just 1% of users
submitted
In my script I get an IndexError exception when I try to download an author using the
AuthorRetrieval
interface.The code is:
The traceback is:
But when I try to reproduce it in the command line, the error is different, as I get a Scopus429Error:
It happened before with another Scopus ID, but I was not able to reproduce it in the command line, so I restarted the script, it worked and it moved on.
The problem is, I don't want the script to stop.