Open bnewbold opened 4 years ago
@bnewbold is this constant error or is it sporadic error?
My intention is to know if this occurs in all processing? I need to know if you are not getting SciELO metadata, so that we can classify and prioritize this demand.
@jamilatta Thank you for your rapid reply!
This error occured on my first attempt, after iterating through about 19,700 identifiers. Here is the script I am writing:
https://gist.github.com/bnewbold/9918634282f6013e13174badbce64a93
I am running a second time now and have gotten past 50,000 identifiers, so this is probably sporadic. I'll note that I almost immediately get requests.exceptions.ReadTimeout
errors (in both cases, trying from two separate machines). The complete failure happens if:
fail retrieving data from (http://articlemeta.scielo.org/api/v1/article/identifiers) attempt(1/10)
... all the attempts fail. I assume this is due to rate limiting, as mentioned in the source. Perhaps there should be an extra delay by default to prevent these timeouts?
As some context, I am hoping to extract the full metadata for all 900k - 1million articles as a JSON snapshot, to archive and include in https://fatcat.wiki. Particularly articles which do not have a DOI. If there is a more efficient way to achieve this, please let me know!
Thank you for maintaining articlemetaapi
.
@bnewbold I will think a way to avoid all the attempts fail.
Lets me talk with coworkers to think about and soon I return to you.
Thanks.
Python version: 3.7 articlemetaapi version: 1.26.6
This error happens after many timeouts. Maybe due to HTTP 429 back-off responses? The
self._do_request(url, params=params)
statement should perhaps be called first and then status checked.