scholarly-python-package / scholarly

Retrieve author and publication information from Google Scholar in a friendly, Pythonic way without having to worry about CAPTCHAs!
https://scholarly.readthedocs.io/
The Unlicense
1.39k stars 299 forks source link

Docker #396

Closed hp0404 closed 2 years ago

hp0404 commented 2 years ago

What feature would you like to request? Has anyone tried running scholarly from Docker? I've spent a few hours on it, but I keep getting blocked (tor proxy rotation doesn't seem to work)

Is your feature request related to a problem? Please describe. I'd like to run scholarly from windows machine, so I thought I'd build a docker container with tor running inside of it, and mount downloaded publications to my local windows folder

arunkannawadi commented 2 years ago

tor is not fully supported any more in scholarly. Have you tried with ScraperAPI or Luminati?

hp0404 commented 2 years ago

hmm those are paid services, right? tor worked file when I used it outside of docker (scholarly 1.0.2 I think)

arunkannawadi commented 2 years ago

There has been several changes since 1.0.2 that are not guaranteed to work with tor. It would help to know which query is blocking you. If you're using scholarly.fill on author objects, use sections keyword to fill only the sections you need. In particular, avoid filling coauthors.

Also, ScraperAPI offers 500 requests a month, and needs no payment info to sign up. If that volume is sufficient, you could use that.

arunkannawadi commented 2 years ago

Closing the issue due to lack of activity. If this problem persists, please feel free to reopen this issue

  1. after confirming that the issue is only with later versions of the library and not with 1.0.2
  2. with the exact query that is failing