scholarly-python-package / scholarly

Retrieve author and publication information from Google Scholar in a friendly, Pythonic way without having to worry about CAPTCHAs!
https://scholarly.readthedocs.io/
The Unlicense
1.3k stars 292 forks source link

Certificate expired? #448

Closed tnaber closed 1 year ago

tnaber commented 1 year ago

Describe the bug Just running the example:

author = next(scholarly.search_author('Steven A Cholewiak'))
scholarly.pprint(author)

returns an error saying: urllib.error.URLError: <urlopen error [SSL: CERTIFICATE_VERIFY_FAILED] certificate verify failed: Hostname mismatch, certificate is not valid for 'fake-useragent.herokuapp.com'. (_ssl.c:1129)>

To Reproduce Run the setup example (without a proxy, is this needed?) The bug also happened when using:

pg = ProxyGenerator()
success = pg.FreeProxies()
scholarly.use_proxy(pg)

Expected behavior Return the paper

Screenshots

Desktop (please complete the following information):

Do you plan on contributing? Your response below will clarify whether the maintainers can expect you to fix the bug you reported.

arunkannawadi commented 1 year ago

Did you try running this several times, and is this persistent?

tnaber commented 1 year ago

Hi, Thanks for the quick reply :) Yes I ran it multiple times. If you think it might be OS specific I will try it on linux and windows. Are you able to make calls right now?

Here is the full trace:


  File "/Users/titus/Documents/Projects/notion2bibtex/venv/lib/python3.9/site-packages/fake_useragent/utils.py", line 154, in load
    for item in get_browsers(verify_ssl=verify_ssl):
  File "/Users/titus/Documents/Projects/notion2bibtex/venv/lib/python3.9/site-packages/fake_useragent/utils.py", line 99, in get_browsers
    html = html.split('<table class="w3-table-all notranslate">')[1]
IndexError: list index out of range

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/Library/Developer/CommandLineTools/Library/Frameworks/Python3.framework/Versions/3.9/lib/python3.9/urllib/request.py", line 1346, in do_open
    h.request(req.get_method(), req.selector, req.data, headers,
  File "/Library/Developer/CommandLineTools/Library/Frameworks/Python3.framework/Versions/3.9/lib/python3.9/http/client.py", line 1257, in request
    self._send_request(method, url, body, headers, encode_chunked)
  File "/Library/Developer/CommandLineTools/Library/Frameworks/Python3.framework/Versions/3.9/lib/python3.9/http/client.py", line 1303, in _send_request
    self.endheaders(body, encode_chunked=encode_chunked)
  File "/Library/Developer/CommandLineTools/Library/Frameworks/Python3.framework/Versions/3.9/lib/python3.9/http/client.py", line 1252, in endheaders
    self._send_output(message_body, encode_chunked=encode_chunked)
  File "/Library/Developer/CommandLineTools/Library/Frameworks/Python3.framework/Versions/3.9/lib/python3.9/http/client.py", line 1012, in _send_output
    self.send(msg)
  File "/Library/Developer/CommandLineTools/Library/Frameworks/Python3.framework/Versions/3.9/lib/python3.9/http/client.py", line 952, in send
    self.connect()
  File "/Library/Developer/CommandLineTools/Library/Frameworks/Python3.framework/Versions/3.9/lib/python3.9/http/client.py", line 1426, in connect
    self.sock = self._context.wrap_socket(self.sock,
  File "/Library/Developer/CommandLineTools/Library/Frameworks/Python3.framework/Versions/3.9/lib/python3.9/ssl.py", line 500, in wrap_socket
    return self.sslsocket_class._create(
  File "/Library/Developer/CommandLineTools/Library/Frameworks/Python3.framework/Versions/3.9/lib/python3.9/ssl.py", line 1040, in _create
    self.do_handshake()
  File "/Library/Developer/CommandLineTools/Library/Frameworks/Python3.framework/Versions/3.9/lib/python3.9/ssl.py", line 1309, in do_handshake
    self._sslobj.do_handshake()
ssl.SSLCertVerificationError: [SSL: CERTIFICATE_VERIFY_FAILED] certificate verify failed: Hostname mismatch, certificate is not valid for 'fake-useragent.herokuapp.com'. (_ssl.c:1129)

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/Users/titus/Documents/Projects/notion2bibtex/venv/lib/python3.9/site-packages/fake_useragent/utils.py", line 64, in get
    with contextlib.closing(urlopen(
  File "/Library/Developer/CommandLineTools/Library/Frameworks/Python3.framework/Versions/3.9/lib/python3.9/urllib/request.py", line 214, in urlopen
    return opener.open(url, data, timeout)
  File "/Library/Developer/CommandLineTools/Library/Frameworks/Python3.framework/Versions/3.9/lib/python3.9/urllib/request.py", line 517, in open
    response = self._open(req, data)
  File "/Library/Developer/CommandLineTools/Library/Frameworks/Python3.framework/Versions/3.9/lib/python3.9/urllib/request.py", line 534, in _open
    result = self._call_chain(self.handle_open, protocol, protocol +
  File "/Library/Developer/CommandLineTools/Library/Frameworks/Python3.framework/Versions/3.9/lib/python3.9/urllib/request.py", line 494, in _call_chain
    result = func(*args)
  File "/Library/Developer/CommandLineTools/Library/Frameworks/Python3.framework/Versions/3.9/lib/python3.9/urllib/request.py", line 1389, in https_open
    return self.do_open(http.client.HTTPSConnection, req,
  File "/Library/Developer/CommandLineTools/Library/Frameworks/Python3.framework/Versions/3.9/lib/python3.9/urllib/request.py", line 1349, in do_open
    raise URLError(err)
urllib.error.URLError: <urlopen error [SSL: CERTIFICATE_VERIFY_FAILED] certificate verify failed: Hostname mismatch, certificate is not valid for 'fake-useragent.herokuapp.com'. (_ssl.c:1129)>

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/Users/titus/Documents/Projects/notion2bibtex/main.py", line 3, in <module>
    from scholarly import scholarly, ProxyGenerator
  File "/Users/titus/Documents/Projects/notion2bibtex/venv/lib/python3.9/site-packages/scholarly/__init__.py", line 4, in <module>
    scholarly = _Scholarly()
  File "/Users/titus/Documents/Projects/notion2bibtex/venv/lib/python3.9/site-packages/scholarly/_scholarly.py", line 30, in __init__
    self.__nav = Navigator()
  File "/Users/titus/Documents/Projects/notion2bibtex/venv/lib/python3.9/site-packages/scholarly/_navigator.py", line 26, in __call__
    cls._instances[cls] = super(Singleton, cls).__call__(*args,
  File "/Users/titus/Documents/Projects/notion2bibtex/venv/lib/python3.9/site-packages/scholarly/_navigator.py", line 42, in __init__
    self.pm1 = ProxyGenerator()
  File "/Users/titus/Documents/Projects/notion2bibtex/venv/lib/python3.9/site-packages/scholarly/_proxy_generator.py", line 54, in __init__
    self._new_session()
  File "/Users/titus/Documents/Projects/notion2bibtex/venv/lib/python3.9/site-packages/scholarly/_proxy_generator.py", line 443, in _new_session
    'User-Agent': UserAgent().random,
  File "/Users/titus/Documents/Projects/notion2bibtex/venv/lib/python3.9/site-packages/fake_useragent/fake.py", line 69, in __init__
    self.load()
  File "/Users/titus/Documents/Projects/notion2bibtex/venv/lib/python3.9/site-packages/fake_useragent/fake.py", line 75, in load
    self.data = load_cached(
  File "/Users/titus/Documents/Projects/notion2bibtex/venv/lib/python3.9/site-packages/fake_useragent/utils.py", line 250, in load_cached
    update(path, use_cache_server=use_cache_server, verify_ssl=verify_ssl)
  File "/Users/titus/Documents/Projects/notion2bibtex/venv/lib/python3.9/site-packages/fake_useragent/utils.py", line 245, in update
    write(path, load(use_cache_server=use_cache_server, verify_ssl=verify_ssl))
  File "/Users/titus/Documents/Projects/notion2bibtex/venv/lib/python3.9/site-packages/fake_useragent/utils.py", line 187, in load
    ret = json.loads(get(
  File "/Users/titus/Documents/Projects/notion2bibtex/venv/lib/python3.9/site-packages/fake_useragent/utils.py", line 84, in get
    raise FakeUserAgentError('Maximum amount of retries reached')
fake_useragent.errors.FakeUserAgentError: Maximum amount of retries reached```
arunkannawadi commented 1 year ago

The example you posted works for me with FreeProxies and without any proxy. I wonder if it would be a network issue of some sort. Could you try running this before you make any queries:

from scholarly import ProxyGenerator
pg = ProxyGenerator()
pg.FreeProxies()
scholarly.use_proxy(pg, pg)
tnaber commented 1 year ago

It works now. No idea why, it works with and without the Proxy generator. I was on my university network and it works now on my home network so it might be due to that. Thanks for the help!

Strain-solutions commented 1 year ago

HI - I am getting exactly the same error as posted by @tnaber, with or without free proxies? I am on my home network without an VPN etc. I don't know if this helps but the error occurs before any of the python scripts is run. For example if I try to run the following:

    print('Starting script')
    pg = ProxyGenerator()
    pg.FreeProxies()
    scholarly.use_proxy(pg, pg)
    author = next(scholarly.search_author('Steven A Cholewiak'))
    scholarly.pprint(author)

The first print statement is never called, so I assume this is something to do with the initial spin-up of Scholarly?