Closed thelondonsimon closed 1 year ago
With no proxy and FreeProxies
, I could not fetch this (see #465 ), but then when I tried with ScraperAPI
, I got 404 (not found) response for scholar_id = 'oMaIg8sAAAAJ'
and success with scholar_id = 'PEJ42J0AAAAJ'
. It is true that the code does not handle 302 redirects, but I did not encounter this. Is this a consistent issue?
Yes, I found it was a recurring issue. I had a script iterating through a list of scholar_ids and if it encountered one which effectively had a 302 redirect (when viewed in a browser), it would get stuck in the kind of loop identified in the logging referenced in my original post.
Redirection should now be handled in the recent version (>= 1.7.8
).
However, I'd still recommend to not give an outdated scholar_id
because handling redirection is of limited use. Google Scholar has redirection only for the main author's page. scholarly
constructs specific URLs from the given scholar_id
to fill in all the relevant information, and they get a 404 response instead of 302.
For e.g., trying to get the publication information from the outdated ID would be https://scholar.google.com/citations?view_op=view_citation&hl=en&user=oMaIg8sAAAAJ&citation_for_view=oMaIg8sAAAAJ:M3ejUd6NZC8C which returns 404, whereas, https://scholar.google.com/citations?user=oMaIg8sAAAAJ&hl=en gets 302. I
OK! scholarly v1.7.11
is smart enough to update the scholar_id
and allow all methods that would be normally allowed. I just learnt that scholar_id
values that are close point to the same user. For e.g., https://scholar.google.com/citations?user=PEJ42J0AAAAR or https://scholar.google.com/citations?user=PEJ42J0AAABJ all point to the same profile. Google Scholar is just weird.
Anyhow, the redirection appears to work with no proxies and with FreeProxies
, but not with ScraperAPI
, despite turning on the relevant API parameters. However, this shouldn't be an issue since scholarly
uses FreeProxies
to fetch this information even if you have setup ScraperAPI
(unless you use ScraperAPI
as the secondary proxy as well, which is in general a bad idea).
Describe the bug When calling
fill()
on an author record whose scholar_id has a 302 redirect, scholarly gets stuck in a loop on the original URL.To Reproduce
Results in logging such as:
Expected behavior The 302 redirect will be observed and results will be the same as for searching for
scholar_id = 'PEJ42J0AAAAJ'
Desktop (please complete the following information):