scholarly-python-package / scholarly

Retrieve author and publication information from Google Scholar in a friendly, Pythonic way without having to worry about CAPTCHAs!
https://scholarly.readthedocs.io/
The Unlicense
1.36k stars 298 forks source link

Unexpected behavior of `start_index` in `search_citedby` results in an empty generator #531

Open HSILA opened 8 months ago

HSILA commented 8 months ago

Describe the bug When using the search_citedby function from the scholarly with a non-zero start_index, the returned generator is expected to skip the specified number of articles and return the remaining articles that cite the given PUBLICATION_ID. However, when start_index is set to any value greater than 0, the generator unexpectedly contains 0 items, even though the corresponding Google Scholar URI (/scholar?hl=en&cites=16837829726140559426&as_ylo=2023&as_yhi=2023&as_vis=0&as_sdt=0,33&start=270 with cited_by._url) when accessed directly in a browser shows the correct page with results.

To Reproduce

from scholarly import scholarly

PUBLICATION_ID = 16837829726140559426

# This should skip the first 270 articles and return the rest from the year 2023
cited_by = scholarly.search_citedby(PUBLICATION_ID, start_index=270, year_low=2023, year_high=2023)

total_results = cited_by._get_total_results()
print(f"Total results: {total_results}") # 0

When I use this code snippet with a start_index equal to 0, it will print out Total results: 3340 but when I use it with any positive value for start_index, it will print out Total results: 0

Expected behavior The expected behavior is that the generator should skip the first start_index number of articles and return the remaining articles that cite the PUBLICATION_ID from the year 2023. Meaning that the code snippet should print out Total results: 3070

Screenshots Screenshots are not applicable as this is a code execution issue, but the maintainers can reproduce the issue using the provided code snippet.

Desktop:

Do you plan on contributing? Your response below will clarify whether the maintainers can expect you to fix the bug you reported.