joeyism / linkedin_scraper

A library that scrapes Linkedin for user data
GNU General Public License v3.0
2.01k stars 560 forks source link

list index out of range in case of Person #94

Open tak-oda opened 3 years ago

tak-oda commented 3 years ago

Hi joeysim,

I faced following error when I execute Person(url, driver).

  1. Message File "C:\Users\takes\anaconda3\lib\site-packages\linkedin_scraper\person.py", line 113, in scrape_logged_in self.name = root.find_elements_by_xpath("//section/div/div/div/*/li")[0].text.strip()

    IndexError: list index out of range

  2. Code driver = webdriver.Chrome() actions.login(driver, email, password)

    url = "https://www.linkedin.com/in/particular-user-id" person = Person(url, driver=driver)

  3. Version linkedin-scraper 2.8.0 chromedriver-binary 89.0.4389.23.0

During running Person, my chrome browser was able to login to LinkedIn. After opening the target profile, above message is shown up.

I had executed the same code successfully one week ago (2021/04/11). Appreciate if someone could help to solve this.

Regards,

Takeshi

joeyism commented 3 years ago

@tak-oda this is my code

import os
from linkedin_scraper import Person, Company, actions
from selenium import webdriver
driver = webdriver.Chrome("./chromedriver")
actions.login(driver, os.getenv("LINKEDIN_USER"), os.getenv("LINKEDIN_PASSWORD"))
person = Person("https://www.linkedin.com/in/joey-sham-aa2a50122/", driver=driver, close_on_complete=False)
print(person)

and this is the results

Joey Sham

About
['Experienced Data Scientist with a demonstrated history of working in the financial services industry. Skilled in Development, Statistical Data Analysis, Data Science, Statistical Modeling, and Machine Learning. Strong engineering professional with a Master of Science (M.Sc.) focused in Applied Mathematical Data Analysis from McMaster University.']

Experience
[Senior Machine Learning Engineer at SnapTravel from Nov 2018 to Present for 2 yrs 6 mos based at Toronto, Canada Area, Lead Educator - Data Analytics, Data Science, Machine Learning at BrainStation from Apr 2017 to Present for 4 yrs 1 mo based at Toronto, Canada Area, Data Science Consultant at None from None to None for None based at None, Machine Learning Engineer, Data Engineer at Hubba from Feb 2018 to Nov 2018 for 10 mos based at Toronto, Canada Area, Data Scientist at Interac Association and Acxsys Corporation from Feb 2017 to Feb 2018 for 1 yr 1 mo based at Toronto, Canada Area]

Education
[Master of Science (M.Sc.) at McMaster University from 2012 to 2015, Bachelor of Engineering (B.Eng.) at Faculty of Engineering - McMaster University from 2007 to 2012]

Interest
[]

Accomplishments
[]

Contacts
[Stavroula Liovas (Digital Marketing Strategist), Yash Parekh (Java | JS | React | React Native | Node | Express | AWS | Software Engineering), Craig Bonner (Sr. Engineer Analyst at BT Global Services), Andrews Simon (Building Teams at Observe.AI), Prashant Chaudhary (Software Engineer at Google.), Shahab Mosallaie (Data Scientist), Thuan Tran (ML Recruiting at Facebook), Madhavi Reddy (Research Analyst), Ankit Sablok (Software Development Engineer II @ Amazon Inc.), Willem Goudsbloem (IT Architect), Tushar Atmakuru (Incoming Operations Agent @ RBC), Harpreet Sahota (Data Scientist | Statistician | Podcaster | The Idea Broker | Escaping Competition Through Authenticity), Nazanin Aslani, PhD (Looking for Operations research scientist, Data scientist, Research scientist, Data Analytic positions), Khizar Sultan (Data Scientist / Data Analyst / Artificial Intelligence / Machine & Deep Learning / Open Source Contributor / Python / R), Saeed Astaneh (Global Data Science Director, Visa Consulting & Analytics at Visa), Laura Denman (Chief Executive Officer at The Coder Connection), Vasundhara Mehta (Data Scientist | Physicist), Jean-Baptiste Lemoine (Senior Data Scientist at Aive), James Park (Director, Recruitment at icon), Steve Rickard (Director - Sales at Planet4IT), Ankit Yadav (MENG ECE at University of Windsor), Sarah Smith (Recruiter at Rekrütr), Eytan Ohayon (Real Estate Financier), Pavel Chekmaryov (CEO at Occur), Arashpreet (Amy) Chhina (Senior Technical Talent Acquisition Partner at Snapcommerce), Aaron Wang (Intern at Dark Star Quantum Lab), Odile Vander Zaag (she/her) (Talent Consultant at Mirae Talent & Executive Search), Aloukik Aditya (Machine Learning Engineer/Data Scientist with a focus on Deep Learning, NLP, and Computer vision. Seeking new opportunities), Jyotirmayee Rao (Senior Technical Recruiter (IT) at Collabera Canada Inc.), Michael Downs (Head of Business Operations at Arize AI - Real Time Observability for AI - mike@arize.com), Johann Manukulasuriya (Aerospace Engineer turned Data Analyst, seeking a data/business analyst position and ready to make a positive impact), Anton Goncharuk (Senior Data Analyst at Microsoft), Julian Murillo (Data Analyst at Quartz Network), Mustafa A. Ezzy (Connecting Employers & Job Seekers By Implementing Unique, Talent Sourcing Exchange Software), Gagan Sachdeva (Sr. Account Executive - MongoDB Canada), Shaiann Blumen Hadar (SDR at Bright Data (Formerly Luminati) | Data Collection Automation Platform), Christian Harris (Founder at hireVouch and workVouch), Sonali Batra (Senior Information Technology Specialist at Self Employed), Andrew Hilson (Headhunter, Coffee Aficionado, Pipeline Developer), Sonia J (Sr.Technical Recruiter (IT))]
urwithajit9 commented 3 years ago

Hi @tak-oda I am getting the same issue, login is being successful and the profile page also gets open in the browser but scrape return list out of index error. I think XPath changed by LinkedIn.

Regards.

joeyism commented 3 years ago

@urwithajit9 it's certainly possible. Can you paste your code so I can reproduce your error? If not, it's difficult to fix it

aradzekler commented 3 years ago

try replace the following line in person.py: self.name = root.find_elements_by_xpath("//section/div/div/div/*/li")[0].text.strip() with this: self.name = root.find_elements_by_xpath("//section/div/div/div/*/li-icon")[0].text.strip()

tak-oda commented 3 years ago

Hi @aradzekler ,

Thank you so much for your comment. I am unable to try your solution because I encounter the other error before coming to this method. It looks like I have the same error as #100 during the login call. Let me hold this communication until I solve another error.

doubleUTF commented 3 years ago

try replace the following line in person.py: self.name = root.find_elements_by_xpath("//section/div/div/div/*/li")[0].text.strip() with this: self.name = root.find_elements_by_xpath("//section/div/div/div/*/li-icon")[0].text.strip()

I get the same error and tried your method but still fails: line 114, in scrape_logged_in self.name = root.find_elements_by_xpath("//section/div/div/div/*/li-icon")[0].text.strip() IndexError: list index out of range

Seems the xpath needs updating because it's not finding anything.

Code: from linkedin_scraper import Person, actions from selenium import webdriver driver = webdriver.Chrome() email = "#####" password = "######" actions.login(driver, email, password) # if email and password isnt given, it'll prompt in terminal person = Person("https://www.linkedin.com/in/joey-sham-aa2a50122/", driver=driver) print(person)

joeyism commented 3 years ago

Can you try it with 2.9.0? A lot of things are fixed in that version