scholarly-python-package / scholarly

Retrieve author and publication information from Google Scholar in a friendly, Pythonic way without having to worry about CAPTCHAs!
https://scholarly.readthedocs.io/
The Unlicense
1.29k stars 292 forks source link

Error with running scholarly with Tor server in the background #488

Closed hp0404 closed 1 year ago

hp0404 commented 1 year ago

Description I am trying to run the scholarly package while having the Tor server running in the background; however, I am encountering the following errors:

log

2023-02-07 19:15:13,208 [INFO] Exception ConnectError while fetching page: ('[Errno -3] Temporary failure in name resolution',)
2023-02-07 19:15:13,208 [INFO] Retrying with a new session.
2023-02-07 19:15:13,208 [INFO] Refreshing Tor ID...
2023-02-07 19:15:13,213 [DEBUG] GETCONF __owningcontrollerprocess (runtime: 0.0007)
2023-02-07 19:15:13,250 [INFO] Error while receiving a control message (SocketClosed): received exception "read of closed file"
2023-02-07 19:15:20,162 [INFO] Exception ConnectError while fetching page: ('[Errno -3] Temporary failure in name resolution',)
2023-02-07 19:15:20,162 [INFO] Retrying with a new session.
2023-02-07 19:15:20,163 [INFO] Refreshing Tor ID...
2023-02-07 19:15:20,167 [DEBUG] GETCONF __owningcontrollerprocess (runtime: 0.0006)

This issue did not occur previously and I cannot use scholarly as a result.

Environment

arunkannawadi commented 1 year ago

Tor is not officially supported, primarily because I don't use it. I think @ipeirotis may have used them in the past, but I have no idea what the issue is, or know how to fix it.

ipeirotis commented 1 year ago

Tor worked at some point as a reasonable proxy, until it stopped. That was the reason that we marked it as a deprecated feature; we could not really provide support for it. I am afraid that this will be hard to debug, and I would recommend closing this.

arunkannawadi commented 1 year ago

@hp0404 please try with FreeProxies or with ScraperAPI that seems to have had better results of late. The latter is reliable, but their free plan is very limited (you can fetch about 40 pages per month on their free plan).

hp0404 commented 1 year ago

I see, thanks!