joeyism / linkedin_scraper

A library that scrapes Linkedin for user data
GNU General Public License v3.0
1.97k stars 552 forks source link

Scraping from a list of links #147

Open HABER7789 opened 1 year ago

HABER7789 commented 1 year ago

Hi, I am really stunned by the scraper you have built and really glad to be able to use it. I am facing an issue in scraping a list of people from an excel file that basically just has links.

The scraper starts scraping the first link, and then after scraping one link, it does manage to go to the other profile as I can view from chrome window, but it throws an exception and is unable to scrape further, giving me the data that was scraped for only one person in the beginning.

I would really appreciate your help in this, attaching my code here.

from linkedin_scraper import Person, actions
from selenium import webdriver
from selenium.webdriver.chrome.options import Options
from webdriver_manager.chrome import ChromeDriverManager
from selenium.webdriver.common.by import By
from selenium.webdriver.common.keys import Keys
#from selenium.webdriver.chrome.service import Service
import pandas as pd
import openpyxl

chrome_options = Options()
chrome_options.add_argument("--headless")
driver = webdriver.Chrome(executable_path='C:\chromedriver.exe')
driver.set_window_size(1920, 1080)

email = "Email"
password = "password"
actions.login(driver, email, password) # if email and password isnt given, it'll prompt in terminal

dataframe1 = pd.read_excel('People.xlsx') 
links = list(dataframe1['PeopleLinks'])

ExtractedList = []

for i in links:    
    person = Person(i, driver=driver, scrape=False)
    person.scrape(close_on_complete=False)
    ExtractedList.append(person)

for j in ExtractedList:
    print(j)
joeyism commented 1 year ago

What's the error that you get?

HABER7789 commented 1 year ago

What's the error that you get?

Hey there! image

HABER7789 commented 1 year ago

What's the error that you get?

Hey there! ![image](https://user-images.githubusercontent.com/124895699/228524018-54d53647-b8f4-4e00-a115-d4e763f9bed

What's the error that you get?

This is the error I am getting, issue here is, if it is scraping one person, it should do the same thing for the other right? Please do correct me anywhere if I am incorrect. Thanks!

rizwankaz commented 1 year ago

Hey! I'm also getting the same error; is it just that the css-selector has changed?

lusifer021 commented 1 year ago

158

This PR solves this issue and can parse multiple person links.

HABER7789 commented 1 year ago

158 This PR solves this issue and can parse multiple person links.

Thanks a ton!!!!!!!!!!!, it works, really appreciate your help here. Cheers man ! @lusifer021

lusifer021 commented 1 year ago

158 This PR solves this issue and can parse multiple person links.

Thanks a ton!!!!!!!!!!!, it works, really appreciate your help here. Cheers man ! @lusifer021

Welcome @HABER7789

jakalfayan commented 1 year ago

@joeyism i'm doing this as well and I wanted to ask how to have it exclude scraping the people in the company scrape? My current code is below but wanted to ask since I've got a long list of companies and i don't need the employees piece. Let me know.

`import pandas as pd from linkedin_scraper import Person, Company, actions from selenium import webdriver from selenium.webdriver.chrome.service import Service ser = Service(r"c:\se\chromedriver.exe") op = webdriver.ChromeOptions() driver = webdriver.Chrome(service=ser, options=op)

email = "XXXX@gmail.com" password = "XXXXXXXXXXX" actions.login(driver, email, password) # if email and password isnt given, it'll prompt in terminal

dataframe1 = pd.read_csv("company_Linkedin_upload.csv') links = list(dataframe1['linkedin url'])

ExtractedList = []

for i in links: company = Company(i, driver=driver, scrape=False, get_employees=False) company.scrape(close_on_complete=False) ExtractedList.append(company) print(company)

for j in ExtractedList: print(j)`