joeyism / linkedin_scraper

A library that scrapes Linkedin for user data
GNU General Public License v3.0
1.97k stars 551 forks source link

[New feature] Collecting info about companies and employed people #2

Closed ASz-IT closed 6 years ago

ASz-IT commented 6 years ago

Hi, I would ask you to improve your library to collecting information about companies. List of companies should be read from external CSV file. On output we should get info about company, below is image with marked required sections: image image image

If it is possible it would by great to get also list of employed people (only names and surnames). Output should be save in JSON format - one file on each companies list. It is possible, how you thing @joeyism??

joeyism commented 6 years ago

@ASzz This feature is published in 2.0.0

I haven't added output or input to json/csv yet, but it scrapes perfectly.

I also changed the name. Since this is not just a user scraper, but a full linkedin scraper, I felt that it was better to change it to linkedin_scraper instead of linkedin_user_scraper. Version 2.0.0 is published in both linkedin_scraper and linkedin_user_scraper. However, new bug fixes and features will only update linkedin_scraper

ASz-IT commented 6 years ago

That's great news @joeyism! Thanks much fro your hard work :)

Unfortunately I still have issue with log in to linkedin (I'm from Poland - maybe here it's required log in to see anything). I add couple line of code and successful log in to site but after that i still have issues...

When i try use your library for Person or Company I always get errors like this: image

joeyism commented 6 years ago

Try doing this, in the exact order: Run ipython In ipython, run the following code (you can modify it if you need to specify your driver)

from linkedin_scraper import Company
company = Company("https://ca.linkedin.com/company/google", scrape=False)

Login to Linkedin Logout of Linkedin In the same ipython code, run

company.scrape(close_on_complete=False)

Does that work/throw an error?

touringkg commented 6 years ago

@joeyism Hi Joey, I've got the same error, tried your recommendation above, login/out then run company.scrape(close_on_complete=False) but ipython / jupyter launches a separate chrome window so linkedin asks to sign in again which results in the same:

Traceback (most recent call last):
  File "C:\Users\...\Anaconda3\envs\scrapers\lib\site-packages\IPython\core\interactiveshell.py", line 2910, in run_code
    exec(code_obj, self.user_global_ns, self.user_ns)
  File "<ipython-input-4-ce4450a2f698>", line 1, in <module>
    company.scrape(close_on_complete=False)
  File "C:\Users\...\Anaconda3\envs\scrapers\lib\site-packages\linkedin_scraper\company.py", line 80, in scrape
    self.name = driver.find_element_by_class_name("name").text
  File "C:\Users\...\AppData\Roaming\Python\Python36\site-packages\selenium\webdriver\remote\webdriver.py", line 555, in find_element_by_class_name
    return self.find_element(by=By.CLASS_NAME, value=name)
  File "C:\Users\...\AppData\Roaming\Python\Python36\site-packages\selenium\webdriver\remote\webdriver.py", line 955, in find_element
    'value': value})['value']
  File "C:\Users\...\AppData\Roaming\Python\Python36\site-packages\selenium\webdriver\remote\webdriver.py", line 312, in execute
    self.error_handler.check_response(response)
  File "C:\Users\...\AppData\Roaming\Python\Python36\site-packages\selenium\webdriver\remote\errorhandler.py", line 237, in check_response
    raise exception_class(message, screen, stacktrace)
selenium.common.exceptions.NoSuchElementException: Message: no such element: Unable to locate element: {"method":"class name","selector":"name"}
  (Session info: chrome=63.0.3239.132)
  (Driver info: chromedriver=2.35.528161 (5b82f2d2aae0ca24b877009200ced9065a772e73),platform=Windows NT 10.0.16299 x86_64)
joeyism commented 6 years ago

Hi @touringkg

It seems like Linkedin has changed its policy, so even having a cookie isn't enough. They probably did this to prevent scraping. Let me investigate into a solution

joeyism commented 6 years ago

@touringkg You can scrape without logging out. Try it without logging out

kpking7 commented 6 years ago

Hi @joeyism

I'm fairly new to programming/Python (6 months in and taking classes at MIT while getting MBA), but I got the scraper working. How difficult would it be to add the capability to pull all employee names (essentially I'm looking to try to monitor the change in employees over time by keeping track of specific names). I'm trying to build myself but obviously, you are much more adept - particularly when it comes to parsing the website data.

joeyism commented 6 years ago

Hi @kpking7 Do you want this done while you are logged in, or while you are logged out?

kpking7 commented 6 years ago

Logged out if possible.

Sent from my iPhone

On Mar 11, 2018, at 11:26 AM, Joey Sham notifications@github.com wrote:

Hi @kpking7 Do you want this done while you are logged in, or while you are logged out?

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub, or mute the thread.

joeyism commented 6 years ago

@kpking7 I don't actually know if logged out is possible. Logged in will take time to implement.

kpking7 commented 6 years ago

Well that’s fine as well. I’d love to help build but am less sure where to start.

Sent from my iPhone

On Mar 11, 2018, at 12:06 PM, Joey Sham notifications@github.com wrote:

@kpking7 I don't actually know if logged out is possible. Logged in will take time to implement.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub, or mute the thread.

joeyism commented 6 years ago

@kpking7 Scraping employees is done automatically in version 2.2.0 on. If you update your linkedin_scraper, you should see it

kpking7 commented 6 years ago

Thanks so much. Will give it a try in the next couple days. Appreciate the work!