joeyism / linkedin_scraper

A library that scrapes Linkedin for user data
GNU General Public License v3.0
2.08k stars 575 forks source link

Not scraping the HQ Location #90

Closed Lightmare2 closed 3 years ago

Lightmare2 commented 3 years ago

Hi,

I'm using your code for my master thesis (thanks btw) to scrape certain information of media companies. I'm interested in the type ,location (HQ) and the amount of employees. I made a preliminary script based on your code to test out if it would work.

I managed to scrape all the info that I need: Name, Type & size except for the headquarters. It returns "None" for the four companies that I tried.

If I'm correct, the HQ on linkedin is for Al Jazeera Doha (Marked in red) right? Next to the type of company.

HQ Example

Is there something wrong in my code that it is not able to scrape that particular part? Would be very usefull if I'm able to scrape that part for my master thesis! Not sure what I'm doing wrong..

Thank you in advance!

My code:

import os
from linkedin_scraper import Person, Company, actions
from selenium import webdriver
from selenium.common.exceptions import NoSuchElementException
driver = webdriver.Chrome(executable_path=r"D:\Python\chromedriver_win32\chromedriver.exe")
#options = webdriver.Chrome(executable_path=r"D:\Python\chromedriver_win32\chromedriver.exe")
#options.add_experimental_option('excludeSwitches', ['enable-logging'])
#driver = webdriver.Chrome(options=options)

email = "Some email"
password= "Some password" #os.getenv("LINKEDIN_USER")
#password = #os.getenv("LINKEDIN_PASSWORD")
actions.login(driver, email, password) # if email and password isnt given, it'll prompt in terminal

urls = ["https://www.linkedin.com/company/indiatimes","https://www.linkedin.com/company/criticalhitnet/about/", "https://www.linkedin.com/company/enca/", "https://www.linkedin.com/company/aljazeera/"]
companies=[]  

for url in urls:
    company = Company(url, driver=driver, close_on_complete=False, get_employees=False)
    companies.append(company)

for company in companies:
    print(company.headquarters)
    print(company.name)
    print(company.company_type)
    print(company.company_size)
    #print(company.about_us)
NLCas8 commented 3 years ago

Experiencing the same here. Let me know if you found a fix!

Edit: Found the issue

In Company.py, add the following around line 212:

elif txt == 'Headquarters':
    self.headquarters = values[i+x_off].text.strip()

I'm not sure why these lines were missing.

Lightmare2 commented 3 years ago

Thank you Cas, it also works for me now :)