scholarly-python-package / scholarly

Retrieve author and publication information from Google Scholar in a friendly, Pythonic way without having to worry about CAPTCHAs!
https://scholarly.readthedocs.io/
The Unlicense
1.3k stars 292 forks source link

`scholarly.search_pubs` runs forever #463

Closed kostrykin closed 1 year ago

kostrykin commented 1 year ago

Describe the bug scholarly.search_pubs runs forever and does not return.

To Reproduce

from scholarly import scholarly, ProxyGenerator

pg = ProxyGenerator()
assert pg.FreeProxies()
scholarly.use_proxy(pg)

print('searching')
search_query = scholarly.search_pubs('10.1007/978-3-031-09037-0_20')
pub = next(search_query)
scholarly.pprint(pub)

Expected behavior Data associated with the publication should be printed. This was working a month ago (I used an older version of scholarly back then and also did not use proxies).

Desktop:

Do you plan on contributing? Your response below will clarify whether the maintainers can expect you to fix the bug you reported.

arunkannawadi commented 1 year ago

This is likely a transient issue due to unavailability of reliable proxies. If you tried it again with no proxies (not recommended to do regularly), it should work or try running the code as it is after some time.

arunkannawadi commented 1 year ago

And whenever possible, try fetching a paper via any of the author's profile. In this instance, you could use search_author_id routine to look for papers by 9TqkClQAAAAJ and iterate through the publication list.

kostrykin commented 1 year ago

This is likely a transient issue due to unavailability of reliable proxies. If you tried it again with no proxies (not recommended to do regularly), it should work or try running the code as it is after some time.

Yes, running the code without the proxies works.

And whenever possible, try fetching a paper via any of the author's profile. In this instance, you could use search_author_id routine to look for papers by 9TqkClQAAAAJ and iterate through the publication list.

Why is that? I am looking for a specific paper, which has a uniquely identified by its DOI. Performing a search query for the author instead of the unique DOI and then filtering the results seems like a very circuitous way.

arunkannawadi commented 1 year ago

Google Scholar actively tries to block programmatic queries that search its publication database but allows queries that search authors database. If you especially want to get info about a specific publication many times over a time period (say regularly track its citation count), I'd recommend going the author's profile page way.

kostrykin commented 1 year ago

Google Scholar actively tries to block programmatic queries that search its publication database but allows queries that search authors database. If you especially want to get info about a specific publication many times over a time period (say regularly track its citation count), I'd recommend going the author's profile page way.

Thanks for pointing this out!

arunkannawadi commented 1 year ago

I tried running your snippet again with FreeProxies and after a few attempts, it did successfully print the paper details. This was likely due to #465 which has now been fixed. Closing this issue as completed.