scholarly-python-package / scholarly

Retrieve author and publication information from Google Scholar in a friendly, Pythonic way without having to worry about CAPTCHAs!
https://scholarly.readthedocs.io/
The Unlicense
1.36k stars 298 forks source link

Cache query results #518

Open tigerjack opened 12 months ago

tigerjack commented 12 months ago

What feature would you like to request? It would be great to have a cache of the results, similarly to what pybliometrics is already doing.

arunkannawadi commented 11 months ago

Could you expand a bit more on the use case? Are you suggesting that if you repeat a query, scholarly should return cached results instead of running the query again, or if a query returns results that have been previously returned from a different query (and filled say), it should fill the information from the cached results?

tigerjack commented 11 months ago

Could you expand a bit more on the use case? Are you suggesting that if you repeat a query, scholarly should return cached results instead of running the query again, or if a query returns results that have been previously returned from a different query (and filled say), it should fill the information from the cached results?

I guess both of them are valid ideas, but the first seems more immediate. pybliometrics, for example, takes the results from the cache unless the user specifically force an update. I guess there's also an expire date for the cached results, but I am not sure about it.

For the use case, even testing some user scripts often requires a lot of time, since the results should be fetched every time from the web.

arunkannawadi commented 11 months ago

OK, having something like a 24-hour expiration for cached results would make sense. I don't have a timeline for this feature, but would welcome a PR from the community.

ltalirz commented 10 months ago

It's actually very easy to do this with requests_cache (no changes needed in scholarly)

See the "patching" approach where requests_cache simply monkeypatches all calls to the requests library

https://requests-cache.readthedocs.io/en/stable/#patching

anapaulagomes commented 10 months ago

I did this @ltalirz but it didn't work. The cache is installed but no URLs are returned. Did you try this out or have an example to show? Thank you!

Update: I just saw where the requests are happening :)

ltalirz commented 10 months ago

Cheers, so it does work for you as well?

anapaulagomes commented 10 months ago

No, it doesn’t. I haven’t look further