joeyism / linkedin_scraper

A library that scrapes Linkedin for user data
GNU General Public License v3.0
1.98k stars 555 forks source link

Scraping Company employees list only returns Null #140

Open yoweeking opened 1 year ago

yoweeking commented 1 year ago

trying to scrape companies and a lot of times I am getting Null in the employees list, here is the code I am running:

chrome_options = Options()
chrome_options.add_argument("--headless")
driver = webdriver.Chrome("../chromedriver.exe")
company = Company("https://www.linkedin.com/company/genesis-trading/", get_employees=True, driver=driver)

and the response looks like this:

{
  "name": "Genesis",
  "about_us": "Genesis facilitates billions in digital currency trades, loans and transactions on a monthly basis.  Our team combines decades of experience at top Wall Street investment banks with a deep understanding of cryptocurrency markets.  Our platform provides a single point of access for digital asset trading, derivatives, borrowing, lending, custody and prime brokerage services.",
  "specialties": null,
  "website": "http://www.genesistrading.com",
  "industry": "Financial Services",
  "company_type": "Genesis",
  "headquarters": "New York, NY",
  "company_size": "51-200 employees",
  "founded": "2013",
  "affiliated_companies": [

  ],
  "employees": [
    null,
    null
  ],
  "headcount": null
}

It seems to work for some companies, but also not for a lot of companies, do you know what might be the root cause?

DA-Mena commented 1 year ago

css selector for container of employees is too generic sometimes and it grabs the wrong element. Going down the list of Fortune 500 companies, this solution couldn't get the first one on the list which was Walmart. I had to rewrite to the appropriate element as well as load any Dom elements I could from the infinite scrolling list to grab a good sample size.

yoweeking commented 1 year ago

Thanks @DA-Mena - do you have an example of how you managed to get it to work please?

tommysteryy commented 1 year ago

Hey @yoweeking and @DA-Mena, thanks for sharing this! Did either of you find a way around this problem?

DA-Mena commented 1 year ago

Sorry I've been busy for a while, I'll try and post what I had. It is as is and was working to some extent for me depending on how large the employee list is. I'll post here my edit when I get back from trip.

DA-Mena commented 1 year ago

I'll try and "parameterize" what I did and will try and open a PR when I think it's good enough

tommysteryy commented 1 year ago

Thanks @DA-Mena! Hope you're enjoying your trip - any luck with this?

DA-Mena commented 1 year ago

@tommysteryy please see if this works for you https://github.com/DA-Mena/linkedin_scraper/tree/fortuneListRunThroughFixes

ver 2.11.1 looks like they tried to attempt it and all I did was added some minor edits where it was failing for me for big company pages. It isn't going to get you all employees as this will be a lengthy process and will depend if LinkIn's choice of frontend framework will handle all that paging when scraping. I tested a run on this new ver and get 800+ (versus without my minor edits which was just only 1 employee) employees from Walmart's 40k+ list...

DA-Mena commented 1 year ago

for a narrow scope of employees you could try putting data in the search box

        filter = "people-search-keywords"
        filterBox = driver.find_element_by_id(filter)
        filterBox.send_keys("project")
        filterBox.send_keys(Keys.RETURN)