scholarly-python-package / scholarly

Retrieve author and publication information from Google Scholar in a friendly, Pythonic way without having to worry about CAPTCHAs!
https://scholarly.readthedocs.io/
The Unlicense
1.37k stars 298 forks source link

proxy related queries failed like search_pubs #434

Closed shuwang21 closed 1 year ago

shuwang21 commented 2 years ago

Describe the bug Really nice work. It seems that some of the proxy-related queries failed. I tired pg.Tor_Internal(tor_cmd ="xxx") and pg.FreeProxies(). Both give me the following error

  File "analysis.py", line 14, in <module>
    search_query = scholarly.search_pubs('Acoustic Eavesdropping through Wireless Vibrometry')
  File "/Users/xxx/Library/Python/3.8/lib/python/site-packages/scholarly/_scholarly.py", line 156, in search_pubs
    return self.__nav.search_publications(url)
  File "/Users/xxx/Library/Python/3.8/lib/python/site-packages/scholarly/_navigator.py", line 283, in search_publications
    return _SearchScholarIterator(self, url)
  File "/Users/xxx/Library/Python/3.8/lib/python/site-packages/scholarly/publication_parser.py", line 53, in __init__
    self._load_url(url)
  File "/Users/xxx/Library/Python/3.8/lib/python/site-packages/scholarly/publication_parser.py", line 59, in _load_url
    self._soup = self._nav._get_soup(url)
  File "/Users/blue/Library/Python/3.8/lib/python/site-packages/scholarly/_navigator.py", line 226, in _get_soup
    html = self._get_page('https://scholar.google.com{0}'.format(url))
  File "/Users/xxx/Library/Python/3.8/lib/python/site-packages/scholarly/_navigator.py", line 177, in _get_page
    raise MaxTriesExceededException("Cannot Fetch from Google Scholar.")
scholarly._proxy_generator.MaxTriesExceededException: Cannot Fetch from Google Scholar.

I also tried pg.ScraperAPI(api_key), and it also gives the following errors.

Traceback (most recent call last):
  File "/Users/xxx/Library/Python/3.8/lib/python/site-packages/requests/models.py", line 971, in json
    return complexjson.loads(self.text, **kwargs)
  File "/Applications/Xcode.app/Contents/Developer/Library/Frameworks/Python3.framework/Versions/3.8/lib/python3.8/json/__init__.py", line 357, in loads
    return _default_decoder.decode(s)
  File "/Applications/Xcode.app/Contents/Developer/Library/Frameworks/Python3.framework/Versions/3.8/lib/python3.8/json/decoder.py", line 337, in decode
    obj, end = self.raw_decode(s, idx=_w(s, 0).end())
  File "/Applications/Xcode.app/Contents/Developer/Library/Frameworks/Python3.framework/Versions/3.8/lib/python3.8/json/decoder.py", line 355, in raw_decode
    raise JSONDecodeError("Expecting value", s, err.value) from None
json.decoder.JSONDecodeError: Expecting value: line 1 column 1 (char 0)

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/Users/xxx/Library/Python/3.8/lib/python/site-packages/scholarly/_proxy_generator.py", line 560, in ScraperAPI
    r = requests.get("http://api.scraperapi.com/account", params={'api_key': API_KEY}).json()
  File "/Users/xxx/Library/Python/3.8/lib/python/site-packages/requests/models.py", line 975, in json
    raise RequestsJSONDecodeError(e.msg, e.doc, e.pos)
requests.exceptions.JSONDecodeError: Expecting value: line 1 column 1 (char 0)

Thanks a lot for your help.

To Reproduce For ScraperAPI, pg.ScraperAPI(api_key) For freeproxy and tor, search_query = scholarly.search_pubs('Perception of physical stability and center of mass of 3D objects')

Expected behavior Should be able to return the correct result

Screenshots None

Desktop (please complete the following information):

Do you plan on contributing? Your response below will clarify whether the maintainers can expect you to fix the bug you reported.

Additional context Add any other context about the problem here.

arunkannawadi commented 2 years ago

I'm unable to reproduce this (I have a different python version though). Does the JSONDecodeError happen when you run search_query = scholarly.search_pubs('Acoustic Eavesdropping through Wireless Vibrometry')? Which version of json do you have? You can find that out by typing import json; print(json.__version__) in python.

bragostin commented 2 years ago

Same issue here,

from scholarly import ProxyGenerator

# Set up a ProxyGenerator object to use free proxies
# This needs to be done only once per session
pg = ProxyGenerator()
pg.FreeProxies()
scholarly.use_proxy(pg)

# Now search Google Scholar from behind a proxy
search_query = scholarly.search_pubs('Perception of physical stability and center of mass of 3D objects')
scholarly.pprint(next(search_query))

returns

    raise MaxTriesExceededException("Cannot Fetch from Google Scholar.")
scholarly._proxy_generator.MaxTriesExceededException: Cannot Fetch from Google Scholar.

even though

pg.FreeProxies() 

returns True

arunkannawadi commented 2 years ago

I am still unable to reproduce the JSONDecodeError. The query passed for me with ScraperAPI but not with FreeProxies.

The MaxTriesExceededException is not uncommon when using FreeProxies. Once you update to the just released v1.7.2, and may be once the existing set of proxies in FreeProxies get unblocked, you should be able to use them again.

shuwang21 commented 2 years ago

The json version is 2.0.9.
For ScraperAPI, the issue happens with pg.ScraperAPI(api_key).

kjk11 commented 1 year ago

I am facing the same issue. Has there been any progress on this or tricks to get around it?

oscarclivio commented 1 year ago

Same problem

WangRongsheng commented 1 year ago

Same problem

arunkannawadi commented 1 year ago

Does "Same problem" mean getting "JSONDecodeError", or "MaxTriesExceededException" error?

arunkannawadi commented 1 year ago

I got the JSONDecodeError when I (intentionally) gave an API_KEY that was invalid. When I gave my correct API_KEY, I did not get that error. So I assume that you supplied an invalid API key.

As for MaxTriesExceededException error, it depends on what proxies are available when you query, and whether they have been used sufficiently (by you and everyone else in the world). It is unfortunately not possible to reliably fetch publications using search_pubs with FreeProxies. However, you should be able to catch that exception and retry again from your program.