joeyism / linkedin_scraper

A library that scrapes Linkedin for user data
GNU General Public License v3.0
1.9k stars 532 forks source link

Large number of nulls in scraped data, especially for companies #111

Open Alex-Bujorianu opened 2 years ago

Alex-Bujorianu commented 2 years ago

Hello,

I posted a comment about this, but I think it would be better to open a new issue. When using version 2.9.0 on Manjaro with Chromium, I get a bunch of null values when trying to scrape companies.

The following code

from linkedin_scraper import Person, actions
from linkedin_scraper import Company
from selenium import webdriver
mydriver = webdriver.Chrome()

email = NOT SHOWN
password = NOT SHOWN
actions.login(mydriver, email, password) # if email and password isn't given, it'll prompt in terminal
#company = Company("https://www.linkedin.com/company/hype-collective-ltd/", driver=mydriver, scrape=False)
#company.scrape(close_on_complete=False)

produces the following output:

{"name": "Hype Collective", "about_us": null, "specialties": null, "website": null, "industry": null, "company_type": "Hype Collective", "headquarters": null, "company_size": null, "founded": null, "affiliated_companies": [], "employees": [null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null]}

EDIT: I changed the code and it works now. I assign the grid in company.py line 208 based on the class name instead of the section number. Making a pull request.

Joshuashou commented 2 years ago

Hey Alex, I was wondering if you see the same issues when doing Person scrape? For me, Person Scrape often returns a list of null values, and I was wondering if it's a similar issue as this one, thanks!

jayhack commented 1 year ago

I have the same issue - person scrapes return a bunch of nulls