scholarly-python-package / scholarly

Retrieve author and publication information from Google Scholar in a friendly, Pythonic way without having to worry about CAPTCHAs!
https://scholarly.readthedocs.io/
The Unlicense
1.3k stars 292 forks source link

Added ability to insert delay between page requests and corrected pub_year to year #433

Closed stanleyrhodes closed 1 year ago

stanleyrhodes commented 2 years ago

Fixes: https://github.com/scholarly-python-package/scholarly/issues/431

Description

Scholarly did not seem to have the ability to reduce the rate of page requests. Tools such as Publish or Perish limit this rate to prevent GS lockout, so this seemed a reasonable option to add. I added the ability to have it be a single number of seconds, or to pass a custom function that generated the number in seconds. The two could be combined, if one wanted to have a constant plus a little random padding from a function.

In addition, bibtex generation seemed to mistakenly produce a field called pub_year that needed to be year to meet proper bibtex format (.e.g., Zotero would not pick up the year with pub_year being the year field, but it does as year). There was no obvious benefit to this field being pub_year, and evidence a previous contributor meant for it to be year so I changed it to year in all cases.

Checklist

arunkannawadi commented 1 year ago

Merging this to a new branch so I could clean up the commits and merged to develop