joeyism / linkedin_scraper

A library that scrapes Linkedin for user data
GNU General Public License v3.0
2.01k stars 560 forks source link

person xpath not working #104

Closed cusco closed 3 years ago

cusco commented 3 years ago

I guess this will always be a game of cat & mouse

LinkedIn changes is html, this lib updates it.... over and over

It would be nice tho, if this lib was constantly updated.

Perhaps I could look into the correct xpath and submit a PR.. this is the first time I'm trying this, not sure if scrapping LinkedIn by hand is worth or its better using a tool such as phantombuster

>>> person = Person("https://www.linkedin.com/in/andre-iguodala-65b48ab5", driver=driver)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/home/cusco/VirtualEnvs/scrapper/lib/python3.7/site-packages/linkedin_scraper/person.py", line 61, in __init__
    self.scrape(close_on_complete)
  File "/home/cusco/VirtualEnvs/scrapper/lib/python3.7/site-packages/linkedin_scraper/person.py", line 86, in scrape
    self.scrape_logged_in(close_on_complete=close_on_complete)
  File "/home/cusco/VirtualEnvs/scrapper/lib/python3.7/site-packages/linkedin_scraper/person.py", line 114, in scrape_logged_in
    self.name = root.find_elements_by_xpath("//section/div/div/div/*/li")[0].text.strip()
IndexError: list index out of range

list index out of range
>>> person = Person("https://www.linkedin.com/in/andre-iguodala-65b48ab5", driver=driver)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/home/cusco/VirtualEnvs/scrapper/lib/python3.7/site-packages/linkedin_scraper/person.py", line 61, in __init__
    self.scrape(close_on_complete)
  File "/home/cusco/VirtualEnvs/scrapper/lib/python3.7/site-packages/linkedin_scraper/person.py", line 86, in scrape
    self.scrape_logged_in(close_on_complete=close_on_complete)
  File "/home/cusco/VirtualEnvs/scrapper/lib/python3.7/site-packages/linkedin_scraper/person.py", line 114, in scrape_logged_in
    self.name = root.find_elements_by_xpath("//section/div/div/div/*/li")[0].text.strip()
IndexError: list index out of range

list index out of range
genechuang commented 3 years ago

I think my issue is same cause, LinkedIn changing xpath index, I'm trying to scrape Location of Person and I'm getting "10 connections" instead, and looks like linkedin_url and job_title are correct, but name and location attributes have been shifted:

actions.login(driver, email, password) person = Person("https://www.linkedin.com/in/bruce-lee-199550188/", driver=driver, scrape=True)

print("url: ", person.linkedin_url) print("Name: ", person.name) print("about: ", person.about) print("job_title: ", person.job_title) print("Location: ", person.location)

% python li_scraper.py url: https://www.linkedin.com/in/bruce-lee-199550188/ Name: Be like Liu Kang. Be like water my friend. #8by8 #stopasianhate Bruce commented about: [] job_title: Billy Lo Location: 10 connections

alexandre-alphonsos commented 3 years ago

Change: self.name = root.find_elements_by_xpath("//section/div/div/div//li")[0].text.strip() To: self.name = root.find_elements_by_xpath("//section/div/div/div/\/h1")[0].text.strip()

joeyism commented 3 years ago

Ok, this has been fixed now in 2.9.0. Sorry for the delay, my jobs got really busy.