scholarly-python-package / scholarly

Retrieve author and publication information from Google Scholar in a friendly, Pythonic way without having to worry about CAPTCHAs!
https://scholarly.readthedocs.io/
The Unlicense
1.29k stars 292 forks source link

MaxTriesExceededException: Cannot Fetch from Google Scholar. with print(success) is 【True】 #509

Open XianZhi1022 opened 11 months ago

XianZhi1022 commented 11 months ago

Describe the bug MaxTriesExceededException: Cannot Fetch from Google Scholar. with print(success) is 【True】

To Reproduce

test_t= "Scholarly editions in print and on the screen: A theoretical comparison"

from scholarly import scholarly, ProxyGenerator

pg = ProxyGenerator()
success = pg.SingleProxy(http = "http://127.0.0.1:7458", https = 'http://127.0.0.1:7458')
print(success)
scholarly.use_proxy(pg,pg)
#the result is True

search_query = scholarly.search_pubs(test_t)
article = next(search_query)

below is the mistake information:

MaxTriesExceededException                 Traceback (most recent call last)
Input In [2], in <cell line: 1>()
----> 1 search_query = scholarly.search_pubs(test_t)
      2 article = next(search_query)
      4 print(article.citedby) # 被引次数`

File G:\Anaconda\lib\site-packages\scholarly\_scholarly.py:160, in _Scholarly.search_pubs(self, query, patents, citations, year_low, year_high, sort_by, include_last_year, start_index)
     97 """Searches by query and returns a generator of Publication objects
     98 
     99 :param query: terms to be searched
   (...)
    155 
    156 """
    157 url = self._construct_url(_PUBSEARCH.format(requests.utils.quote(query)), patents=patents,
    158                           citations=citations, year_low=year_low, year_high=year_high,
    159                           sort_by=sort_by, include_last_year=include_last_year, start_index=start_index)
--> 160 return self.__nav.search_publications(url)

File G:\Anaconda\lib\site-packages\scholarly\_navigator.py:296, in Navigator.search_publications(self, url)
    288 def search_publications(self, url: str) -> _SearchScholarIterator:
    289     """Returns a Publication Generator given a url
    290 
    291     :param url: the url where publications can be found.
   (...)
    294     :rtype: {_SearchScholarIterator}
    295     """
--> 296     return _SearchScholarIterator(self, url)

File G:\Anaconda\lib\site-packages\scholarly\publication_parser.py:53, in _SearchScholarIterator.__init__(self, nav, url)
     51 self._pubtype = PublicationSource.PUBLICATION_SEARCH_SNIPPET if "/scholar?" in url else PublicationSource.JOURNAL_CITATION_LIST
     52 self._nav = nav
---> 53 self._load_url(url)
     54 self.total_results = self._get_total_results()
     55 self.pub_parser = PublicationParser(self._nav)

File G:\Anaconda\lib\site-packages\scholarly\publication_parser.py:59, in _SearchScholarIterator._load_url(self, url)
     57 def _load_url(self, url: str):
     58     # this is temporary until setup json file
---> 59     self._soup = self._nav._get_soup(url)
     60     self._pos = 0
     61     self._rows = self._soup.find_all('div', class_='gs_r gs_or gs_scl') + self._soup.find_all('div', class_='gsc_mpat_ttl')

File G:\Anaconda\lib\site-packages\scholarly\_navigator.py:239, in Navigator._get_soup(self, url)
    237 def _get_soup(self, url: str) -> BeautifulSoup:
    238     """Return the BeautifulSoup for a page on scholar.google.com"""
--> 239     html = self._get_page('https://scholar.google.com{0}'.format(url))
    240     html = html.replace(u'\xa0', u' ')
    241     res = BeautifulSoup(html, 'html.parser')

File G:\Anaconda\lib\site-packages\scholarly\_navigator.py:190, in Navigator._get_page(self, pagerequest, premium)
    188     return self._get_page(pagerequest, True)
    189 else:
--> 190     raise MaxTriesExceededException("Cannot Fetch from Google Scholar.")

MaxTriesExceededException: Cannot Fetch from Google Scholar

Expected behavior I want solve this problem.

Screenshots If applicable, add screenshots to help explain your problem.

Desktop (please complete the following information):

Do you plan on contributing? Your response below will clarify whether the maintainers can expect you to fix the bug you reported.