scholarly-python-package / scholarly

Retrieve author and publication information from Google Scholar in a friendly, Pythonic way without having to worry about CAPTCHAs!
https://scholarly.readthedocs.io/
The Unlicense
1.37k stars 298 forks source link

Query fails to collect data for some keywords, which results in StopIteration in the next() #437

Closed p-veloso closed 1 year ago

p-veloso commented 2 years ago

Describe the bug Query works with some keywords but triggers StopIteration in the next(search_query) with others even if that keyword works manually on google scholar.

To Reproduce

from scholarly import scholarly, ProxyGenerator

# pg = ProxyGenerator()
# success = pg.FreeProxies()
# scholarly.use_proxy(pg)

test = ["reinforcement", "reinforcement learning"] #it words with "reinforcement" but not with "reinforcement learning"
search_query = scholarly.search_keyword(test[1])

# Retrieve the first result from the iterator
first_author_result = next(search_query) #this line results in the StopIteration  error
scholarly.pprint(first_author_result)

# Retrieve all the details for the author
author = scholarly.fill(first_author_result )
scholarly.pprint(author)

# Take a closer look at the first publication
first_publication = author['publications'][0]
first_publication_filled = scholarly.fill(first_publication)
scholarly.pprint(first_publication_filled)

# Print the titles of the author's publications
publication_titles = [pub['bib']['title'] for pub in author['publications']]
print(publication_titles)

# Which papers cited that publication?
citations = [citation['bib']['title'] for citation in scholarly.citedby(first_publication_filled)]
print(citations)

Expected behavior I expected the code to work with any keyword and even with combinations of precise keywords with quotation marks as in the manual search on google scholar.

Desktop (please complete the following information):

Do you plan on contributing?

arunkannawadi commented 1 year ago

Thanks for the bug report. This is indeed a bug caused by the space between the keywords not replaced by a + internally when the URL is generated. This will be fixed in v1.7.3.