joeyism / linkedin_scraper

A library that scrapes Linkedin for user data
GNU General Public License v3.0
1.86k stars 527 forks source link

Crashing with the company scrape #81

Open aabalde opened 3 years ago

aabalde commented 3 years ago

Ey! First of all thank you very much for the work. This scrapper is so nice! I have downloaded it and I have been testing it with the README examples, but it seems that is crashing with the company scrape part. The error returned is as follows:

Traceback (most recent call last):
  File "/home/aabalde/Tools/pycharm-community-2020.2.2/plugins/python-ce/helpers/pydev/pydevd.py", line 1448, in _exec
    pydev_imports.execfile(file, globals, locals)  # execute the script
  File "/home/aabalde/Tools/pycharm-community-2020.2.2/plugins/python-ce/helpers/pydev/_pydev_imps/_pydev_execfile.py", line 18, in execfile
    exec(compile(contents+"\n", file, 'exec'), glob, loc)
  File "/home/aabalde/Documentos/linkedin/proof-of-concept.py", line 15, in <module>
    company.scrape()
  File "/home/aabalde/.local/lib/python3.8/site-packages/linkedin_scraper/company.py", line 86, in scrape
    self.scrape_not_logged_in(get_employees = get_employees, close_on_complete = close_on_complete)
  File "/home/aabalde/.local/lib/python3.8/site-packages/linkedin_scraper/company.py", line 263, in scrape_not_logged_in
    self.name = driver.find_element_by_class_name("name").text.strip()
  File "/home/aabalde/.local/lib/python3.8/site-packages/selenium/webdriver/remote/webdriver.py", line 564, in find_element_by_class_name
    return self.find_element(by=By.CLASS_NAME, value=name)
  File "/home/aabalde/.local/lib/python3.8/site-packages/selenium/webdriver/remote/webdriver.py", line 976, in find_element
    return self.execute(Command.FIND_ELEMENT, {
  File "/home/aabalde/.local/lib/python3.8/site-packages/selenium/webdriver/remote/webdriver.py", line 321, in execute
    self.error_handler.check_response(response)
  File "/home/aabalde/.local/lib/python3.8/site-packages/selenium/webdriver/remote/errorhandler.py", line 242, in check_response
    raise exception_class(message, screen, stacktrace)
selenium.common.exceptions.NoSuchElementException: Message: no such element: Unable to locate element: {"method":"css selector","selector":".name"}
  (Session info: chrome=88.0.4324.150)

The person scrape works fine, is just with the companies. As I said my code is similar to the examples:

from linkedin_scraper import Person, actions
from selenium import webdriver
from linkedin_scraper import Company
import os

if __name__ == '__main__':
    driver = webdriver.Chrome(executable_path=os.environ['CHROMEDRIVER'])
    email = "my.email@gmail.com"
    password = "1234"
    actions.login(driver, email, password)  # if email and password isnt given, it'll prompt in terminal
    #person = Person("https://www.linkedin.com/in/andre-iguodala-65b48ab5", driver=driver)
    company = Company("https://ca.linkedin.com/company/google", scrape=False)
    company.scrape() <--- Crashing here

I don't know if its my fault and I'm missing something, but it looks like LinkedIn did some change in the web structure or something like that.

joeyism commented 3 years ago

hmm you're right, I'll look into it

kaifresh commented 3 years ago

Hey guys. I've also run into the same error. Im guessing Linkedin have changed their classnames recently?

Traceback (most recent call last):
  File "/Applications/PyCharm CE.app/Contents/plugins/python-ce/helpers/pydev/pydevd.py", line 1434, in _exec
    pydev_imports.execfile(file, globals, locals)  # execute the script
  File "/Applications/PyCharm CE.app/Contents/plugins/python-ce/helpers/pydev/_pydev_imps/_pydev_execfile.py", line 18, in execfile
    exec(compile(contents+"\n", file, 'exec'), glob, loc)
  File "/Users/ktan//platforms/linkedin/py-linkedin_scraper/scrape.py", line 16, in <module>
    company = Company("https://ca.linkedin.com/company/google")
  File "/Users/ktan/Library/Python/3.9/lib/python/site-packages/linkedin_scraper/company.py", line 74, in __init__
    self.scrape(get_employees=get_employees, close_on_complete=close_on_complete)
  File "/Users/ktan/Library/Python/3.9/lib/python/site-packages/linkedin_scraper/company.py", line 86, in scrape
    self.scrape_not_logged_in(get_employees = get_employees, close_on_complete = close_on_complete)
  File "/Users/ktan/Library/Python/3.9/lib/python/site-packages/linkedin_scraper/company.py", line 263, in scrape_not_logged_in
    self.name = driver.find_element_by_class_name("name").text.strip()
  File "/Users/ktan/Library/Python/3.9/lib/python/site-packages/selenium/webdriver/remote/webdriver.py", line 564, in find_element_by_class_name
    return self.find_element(by=By.CLASS_NAME, value=name)
  File "/Users/ktan/Library/Python/3.9/lib/python/site-packages/selenium/webdriver/remote/webdriver.py", line 976, in find_element
    return self.execute(Command.FIND_ELEMENT, {
  File "/Users/ktan/Library/Python/3.9/lib/python/site-packages/selenium/webdriver/remote/webdriver.py", line 321, in execute
    self.error_handler.check_response(response)
  File "/Users/ktan/Library/Python/3.9/lib/python/site-packages/selenium/webdriver/remote/errorhandler.py", line 242, in check_response
    raise exception_class(message, screen, stacktrace)
selenium.common.exceptions.NoSuchElementException: Message: no such element: Unable to locate element: {"method":"css selector","selector":".name"}
  (Session info: chrome=88.0.4324.192)
joeyism commented 3 years ago

Oh I see, it's because you guys aren't logged in. I should've noticed it earlier in the error code:

  File "/Users/ktan/Library/Python/3.9/lib/python/site-packages/linkedin_scraper/company.py", line 86, in scrape
    self.scrape_not_logged_in(get_employees = get_employees, close_on_complete = close_on_complete)

Try this

import os
from linkedin_scraper import Person, Company, actions
from selenium import webdriver
driver = webdriver.Chrome("./chromedriver")
user = input("Username: ")
pass = input("Password: ")
actions.login(driver, user, pass)
company = Company("https://ca.linkedin.com/company/google", driver=driver, close_on_complete=False)
print(company)

when it prompted you for the username and pass, did you input them properly?

KavithaRamkrishnan commented 3 years ago

Hi Joe,

I tried the mentioned code but getting below error NoSuchElementException: no such element: Unable to locate element: {"method":"css selector","selector":".org-people-profiles-module__profile-list"} (Session info: chrome=89.0.4389.90)

joeyism commented 3 years ago

@kavithamanivel Can you paste the entire code and error, including which line it's failing? Also the version you're using

iFireMonkey commented 3 years ago

Since Kavi hasn't replied regarding this error and I got the same error I thought i'd provide some more info on it:

Traceback (most recent call last):
  File "scrape_person.py", line 7, in <module>
    company = Company("https://www.linkedin.com/company/google/", driver=driver, close_on_complete=False)
  File "C:\Users\boxed\AppData\Local\Programs\Python\Python38\lib\site-packages\linkedin_scraper\company.py", line 76, in __init__
    self.scrape(get_employees=get_employees, close_on_complete=close_on_complete)
  File "C:\Users\boxed\AppData\Local\Programs\Python\Python38\lib\site-packages\linkedin_scraper\company.py", line 86, in scrape
    self.scrape_logged_in(get_employees = get_employees, close_on_complete = close_on_complete)
  File "C:\Users\boxed\AppData\Local\Programs\Python\Python38\lib\site-packages\linkedin_scraper\company.py", line 251, in scrape_logged_in
    self.employees = self.get_employees()
  File "C:\Users\boxed\AppData\Local\Programs\Python\Python38\lib\site-packages\linkedin_scraper\company.py", line 122, in get_employees
    results_list = driver.find_element_by_class_name(list_css)
  File "C:\Users\boxed\AppData\Local\Programs\Python\Python38\lib\site-packages\selenium\webdriver\remote\webdriver.py", line 564, in find_element_by_class_name
    return self.find_element(by=By.CLASS_NAME, value=name)
  File "C:\Users\boxed\AppData\Local\Programs\Python\Python38\lib\site-packages\selenium\webdriver\remote\webdriver.py", line 976, in find_element
    return self.execute(Command.FIND_ELEMENT, {
  File "C:\Users\boxed\AppData\Local\Programs\Python\Python38\lib\site-packages\selenium\webdriver\remote\webdriver.py", line 321, in execute
    self.error_handler.check_response(response)
  File "C:\Users\boxed\AppData\Local\Programs\Python\Python38\lib\site-packages\selenium\webdriver\remote\errorhandler.py", line 242, in check_response
    raise exception_class(message, screen, stacktrace)
selenium.common.exceptions.NoSuchElementException: Message: no such element: Unable to locate element: {"method":"css selector","selector":".org-people-profiles-module__profile-list"}
  (Session info: chrome=89.0.4389.90)
joeyism commented 3 years ago

@iFireMonkey Can you provide the code you used, and the version you're using please?

KavithaRamkrishnan commented 3 years ago

Hi Joey,

import os from linkedin_scraper import Person, Company, actions from selenium import webdriver driver = webdriver.Chrome('path/chromedriver') user = input("Username: ") password = input("Password: ") actions.login(driver, user, password) company = Company("https://www.linkedin.com/company/life360", driver=driver, close_on_complete=False) print(company)

*** I can able to connect linkedin and can even connect the company. After which i am getting this error

no such element: Unable to locate element: {"method":"css selector","selector":".org-people-profiles-module__profile-list"} (Session info: chrome=89.0.4389.90)

iFireMonkey commented 3 years ago

Just in case it helps, here is my code as well!

from linkedin_scraper import Person, Company, actions
from selenium import webdriver
options = webdriver.ChromeOptions()
options.add_experimental_option('excludeSwitches', ['enable-logging'])
driver = webdriver.Chrome(options=options)
actions.login(driver)
company = Company("https://www.linkedin.com/company/google/", driver=driver, close_on_complete=False)
print(company)

The experimental options are to stop the prompt from spamming about a missing bluetooth driver, however even without it I have the same issue.

joeyism commented 3 years ago

Ok, I think I fixed it in 2.7.7

stainedglassheart commented 3 years ago

Hi Joey, I just downloaded this package and the version states 2.8.0. However I am still getting the following error when using the company scrape:

driver = webdriver.Chrome() email = "" password = "" actions.login(driver, email, password) company = Company("https://www.linkedin.com/company/pinnacle-west-capital-corporation/", get_employees=True)

NoSuchElementException: no such element: Unable to locate element: {"method":"css selector","selector":".name"} (Session info: chrome=89.0.4389.114)

joeyism commented 3 years ago

@stainedglassheart this is my code

import os
from linkedin_scraper import Person, Company, actions
from selenium import webdriver
driver = webdriver.Chrome("./chromedriver")
actions.login(driver, os.getenv("LINKEDIN_USER"), os.getenv("LINKEDIN_PASSWORD"))
company = Company("https://www.linkedin.com/company/pinnacle-west-capital-corporation/", get_employees=True, driver=driver)
print(company)

and the results


Pinnacle West Capital Corporation

For more than 120 years, Pinnacle West and our affiliates have provided energy and energy-related products to people and businesses throughout Arizona. Based in Phoenix, Pinnacle West has consolidated assets of about $11 billion.

Our largest affiliate, Arizona Public Service (APS), generates, sells and delivers electricity and energy-related products and services. APS serves more than a million customers in 11 of Arizona’s 15 counties, and is the operator and co-owner of the Palo Verde Nuclear Generating Station®  – a primary source of electricity for the Southwest.

Our other affiliates include SunCor Development Company, a developer of residential, commercial and industrial real estate; APS Energy Services, a retail energy service provider; and El Dorado Investment Company, a venture capital and investment firm.

Specialties: Power Utility

Website: http://Pinnaclewest.com
Industry: Utilities
Type: Public Company
Headquarters: Phoenix, AZ
Company Size: 5,001-10,000 employees
Founded: 1894

Showcase Pages
[]

Affiliated Companies
[]
cloda01 commented 3 years ago

Hello,

today with Chrome version 91.0.4472.77 and ChromeDriver 91.0.4472.19 i get the error

selenium.common.exceptions.NoSuchElementException: Message: no such element: Unable to locate element: {"method":"css selector","selector":".name"} (Session info: chrome=91.0.4472.77)

i tried with many companies page. Also https://ca.linkedin.com/company/google

someone else with the same error?

thanks

KavithaRamkrishnan commented 3 years ago

Yes. Iam also facing the same. Unable to fix it

cloda01 commented 3 years ago

Hello,

today with Chrome version 91.0.4472.77 and ChromeDriver 91.0.4472.19 i get the error

selenium.common.exceptions.NoSuchElementException: Message: no such element: Unable to locate element: {"method":"css selector","selector":".name"} (Session info: chrome=91.0.4472.77)

i tried with many companies page. Also https://ca.linkedin.com/company/google

someone else with the same error?

thanks

today it works!

Rineraj commented 2 years ago

Hi, i get the error,

selenium.common.exceptions.NoSuchElementException: Message: no such element: Unable to locate element: {"method":"css selector","selector":".name"} (Session info: chrome=94.0.4606.61)

when I try to scrape bulk number of companies, like around 1000 companies. Does anyone face the same issue?

shubhangichaturvedi02 commented 2 years ago

I'm getting the same error.Has anyone got the solution?

saerii commented 2 years ago

EDIT: see below.


Same error running the code below.

Error Message

Traceback (most recent call last):
  File "c:\Users\[user]\Programming Projects\webscraper\main.py", line 22, in <module>
    company.scrape()
  File "C:\Users\[user]\anaconda3\envs\scraping\lib\site-packages\linkedin_scraper\company.py", line 89, in scrape
    self.scrape_not_logged_in(get_employees = get_employees, close_on_complete = close_on_complete)
  File "C:\Users\[user]\anaconda3\envs\scraping\lib\site-packages\linkedin_scraper\company.py", line 279, in scrape_not_logged_in
    self.name = driver.find_element_by_class_name("name").text.strip()
  File "C:\Users\[user]\anaconda3\envs\scraping\lib\site-packages\selenium\webdriver\remote\webdriver.py", line 760, in find_element_by_class_name  
    return self.find_element(by=By.CLASS_NAME, value=name)
  File "C:\Users\[user]\anaconda3\envs\scraping\lib\site-packages\selenium\webdriver\remote\webdriver.py", line 1244, in find_element
    return self.execute(Command.FIND_ELEMENT, {
  File "C:\Users\[user]\anaconda3\envs\scraping\lib\site-packages\selenium\webdriver\remote\webdriver.py", line 424, in execute
    self.error_handler.check_response(response)
  File "C:\Users\[user]\anaconda3\envs\scraping\lib\site-packages\selenium\webdriver\remote\errorhandler.py", line 247, in check_response
    raise exception_class(message, screen, stacktrace)
selenium.common.exceptions.NoSuchElementException: Message: no such element: Unable to locate element: {"method":"css selector","selector":".name"}
  (Session info: chrome=98.0.4758.102)

Code

import os
from selenium import webdriver
from selenium.webdriver.chrome.service import Service
from webdriver_manager.chrome import ChromeDriverManager
from linkedin_scraper import Company, actions

driver = webdriver.Chrome(service=Service(ChromeDriverManager().install()))

actions.login(driver, os.getenv("LINKEDIN_USER"), os.getenv("LINKEDIN_PASSWORD"))

company = Company("https://www.linkedin.com/company/boom-technology-inc./", get_employees=True, driver=driver, close_on_complete=False, scrape=False)
company.scrape()
print(company)

EDIT: I was able to bypass the error message above by adding the following line: driver.implicitly_wait(3).

Updated Code

import os
from selenium import webdriver
from selenium.webdriver.chrome.service import Service
from webdriver_manager.chrome import ChromeDriverManager
from linkedin_scraper import Company, actions

driver = webdriver.Chrome(service=Service(ChromeDriverManager().install()))

actions.login(driver, os.getenv("LINKEDIN_USER"), os.getenv("LINKEDIN_PASSWORD"))

company = Company("https://www.linkedin.com/company/boom-technology-inc./", driver=driver, get_employees=True, close_on_complete=False, scrape=False)
driver.implicitly_wait(3)
company.scrape(close_on_complete=False)
print(company)

However, now I'm seeing some other warnings and ultimately get nothing from the scrape.

New Warnings & Output

C:\Users\[user]\anaconda3\envs\scraping\lib\site-packages\selenium\webdriver\remote\webelement.py:426: UserWarning: find_elements_by_* commands are deprecated. Please use find_elements() instead
  warnings.warn("find_elements_by_* commands are deprecated. Please use find_elements() instead")
[21848:6468:0220/104829.525:ERROR:chrome_browser_main_extra_parts_metrics.cc(227)] START: ReportBluetoothAvailability(). If you don't see the END: message, this is crbug.com/1216328.
[21848:6468:0220/104829.528:ERROR:chrome_browser_main_extra_parts_metrics.cc(230)] END: ReportBluetoothAvailability()
[21848:6468:0220/104829.528:ERROR:chrome_browser_main_extra_parts_metrics.cc(235)] START: GetDefaultBrowser(). If you don't see the END: message, 
this is crbug.com/1216328.
[21848:9160:0220/104829.529:ERROR:device_event_log_impl.cc(214)] [10:48:29.528] USB: usb_device_handle_win.cc:1049 Failed to read descriptor from 
node connection: A device attached to the system is not functioning. (0x1F)
[21848:9160:0220/104829.530:ERROR:device_event_log_impl.cc(214)] [10:48:29.530] USB: usb_device_handle_win.cc:1049 Failed to read descriptor from 
node connection: A device attached to the system is not functioning. (0x1F)
[21848:6468:0220/104829.554:ERROR:chrome_browser_main_extra_parts_metrics.cc(239)] END: GetDefaultBrowser()
C:\Users\[user]\anaconda3\envs\scraping\lib\site-packages\selenium\webdriver\remote\webelement.py:359: UserWarning: find_elements_by_* commands are deprecated. Please use find_elements() instead
  warnings.warn("find_elements_by_* commands are deprecated. Please use find_elements() instead")
{"name": "Boom Supersonic", "about_us": null, "specialties": null, "website": null, "industry": null, "company_type": "Boom Supersonic", "headquarters": null, "company_size": null, "founded": null, "affiliated_companies": [], "employees": [null]}