scholarly-python-package / scholarly

Retrieve author and publication information from Google Scholar in a friendly, Pythonic way without having to worry about CAPTCHAs!
https://scholarly.readthedocs.io/
The Unlicense
1.34k stars 295 forks source link

ERROR: 'geckodriver' executable needs to be in PATH. #331

Closed danuccio closed 2 years ago

danuccio commented 2 years ago

After updating to the latest version of scholarly (1.4.0) I have been having issues filling the author container for some but not all authors.

As an example, when I search for the first ID in the code below, I can fill the author and publication fields without any problem. However, when I use the second ID, I get a lengthy error message that seems to be related to selenium.

I tried running this in Python 3.6 and 3.9 (Also, yes I am aware my ids are commented out in my sample code).

Here is a shortened sample code.

from scholarly import scholarly
#id = scholarly.search_author_id('Smr99uEAAAAJ') #Successful with this id
#id = scholarly.search_author_id('GtLLuxoAAAAJ') #Unsuccessful with this id
print(id)
author = scholarly.fill(id)
print(author)
for publication in author['publications']:
    nPub = scholarly.fill(author['publications'][i])
    print(nPub)

Here is the output I get and error message I receive for the unsuccessful ID after upgrading to version 1.4.0.

{'container_type': 'Author', 'filled': ['basics'], 'scholar_id': 'GtLLuxoAAAAJ', 'source': <AuthorSource.AUTHOR_PROFILE_PAGE: 'AUTHOR_PROFILE_PAGE'>, 'name': 'A. Murat Eren', 'url_picture': 'https://scholar.googleusercontent.com/citations?view_op=view_photo&user=GtLLuxoAAAAJ&citpid=9', 'affiliation': 'Assistant Professor, University of Chicago', 'interests': ['Microbial Ecology', 'Microbial Evolution', "Microbial 'Omics"], 'email_domain': '@uchicago.edu', 'citedby': 7198}
Traceback (most recent call last):
  File "/home/governour/.local/lib/python3.6/site-packages/selenium/webdriver/common/service.py", line 76, in start
    stdin=PIPE)
  File "/usr/lib/python3.6/subprocess.py", line 729, in __init__
    restore_signals, start_new_session)
  File "/usr/lib/python3.6/subprocess.py", line 1364, in _execute_child
    raise child_exception_type(errno_num, err_msg, err_filename)
FileNotFoundError: [Errno 2] No such file or directory: 'geckodriver': 'geckodriver'

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "test.py", line 6, in <module>
    author = scholarly.fill(id)
  File "/home/governour/.local/lib/python3.6/site-packages/scholarly/_scholarly.py", line 198, in fill
    object = author_parser.fill(object, sections, sortby, publication_limit)
  File "/home/governour/.local/lib/python3.6/site-packages/scholarly/author_parser.py", line 362, in fill
    raise(e)
  File "/home/governour/.local/lib/python3.6/site-packages/scholarly/author_parser.py", line 354, in fill
    (getattr(self, f'_fill_{i}')(soup, author) if i != 'publications' else getattr(self, f'_fill_{i}')(soup, author, publication_limit, sortby_str))
  File "/home/governour/.local/lib/python3.6/site-packages/scholarly/author_parser.py", line 154, in _fill_coauthors
    wd = self.nav.pm._get_webdriver()
  File "/home/governour/.local/lib/python3.6/site-packages/scholarly/_proxy_generator.py", line 296, in _get_webdriver
    self._webdriver = webdriver.Firefox()
  File "/home/governour/.local/lib/python3.6/site-packages/selenium/webdriver/firefox/webdriver.py", line 164, in __init__
    self.service.start()
  File "/home/governour/.local/lib/python3.6/site-packages/selenium/webdriver/common/service.py", line 83, in start
    os.path.basename(self.path), self.start_error_message)
selenium.common.exceptions.WebDriverException: Message: 'geckodriver' executable needs to be in PATH. 
silvavn commented 2 years ago

The error states that scholarly is trying to use selenium, more specifically, it is trying to invoke geckodriver which is a version of Firefox. Have you setup your geckodriver and made sure that it is referenced in the PATH variables in the Windows environment variables?

arunkannawadi commented 2 years ago

The change behind this behavior that you are seeing in v1.4 is that, for GtLLuxoAAAAJ, scholarly is trying to fetch all the co-authors instead of only the few you see in the profile page. The proper way to fix this issue is to add geckodriver to your PATH as @silvavn mentioned (we should do this in #185) but a quick and dirty way is to not fill co-authors of the author if you don't really need them. You can modify your script to

author = scholarly.fill(id, sections=['basics', 'publications'])

and it should work fine.

danuccio commented 2 years ago

Thank you! The quick and dirty way fixed it.

arunkannawadi commented 2 years ago

Great! Keeping this issue open until 1.4.1 comes through, that should make your existing code work without having to do any modification.

arunkannawadi commented 2 years ago

Closing this now, since the fix in v1.4.1 should allow the original script to work without any modification. You will however get a warning however about geckodriver not being in the PATH