joeyism / linkedin_scraper

A library that scrapes Linkedin for user data
GNU General Public License v3.0
2.11k stars 580 forks source link

Scraper #1

Closed shanmugamgsn closed 6 years ago

shanmugamgsn commented 6 years ago

I tried to Install Linkedin_user_scrapper I'm facing following error.

2017-11-25

shanmugamgsn commented 6 years ago

Ironically, I'm able to run individual files. Not able to do pip

joeyism commented 6 years ago

huh, that's strange.

Do you have admin access on your console when you're pip installing? Maybe run pip install --user linked_user_scraper instead

shanmugamgsn commented 6 years ago

I tried this and got the following error.

Collecting linked_user_scraper Could not find a version that satisfies the requirement linked_user_scraper (from versions: ) No matching distribution found for linked_user_scraper

I'm a beginner, ,so I'm finding it difficult :(

joeyism commented 6 years ago

Which python version are you using?

You can find out with python -V

shanmugamgsn commented 6 years ago

It's Python 3.6.3

joeyism commented 6 years ago

Try installing it from a git module, with

pip install git+https://github.com/joeyism/linkedin_user_scraper.git

joeyism commented 6 years ago

@shanmugamgsn does it work for you now?

Ammarmajeed commented 6 years ago

Hey I'm trying to install your Linkedin_user_scrapper by the following command as mentioned on your repo: pip3 install --user linkedin_user_scrapper

. But I get this error: Collecting linkedin_user_scrapper Could not find a version that satisfies the requirement linkedin_user_scrapper (from versions: ) No matching distribution found for linkedin_user_scrapper

joeyism commented 6 years ago

@Ammarmajeed I'll fix it when I get home tonight. For now, you can install via

pip install git+https://github.com/joeyism/linkedin_user_scraper.git

Ammarmajeed commented 6 years ago

Thanks

joeyism commented 6 years ago

@Ammarmajeed the bug was that I misspelled scraper in the README. Try now with

pip3 install --user linkedin_user_scraper

Ammarmajeed commented 6 years ago

It works. I have another issue now. So I made a test.py file with the following code in it:

from linkedin_user_scraper.scraper import Person
person = Person("https://www.linkedin.com/in/andre-iguodala-65b48ab5")

And it gives me this error:

os.path.basename(self.path), self.start_error_message)
selenium.common.exceptions.WebDriverException: Message: 'chromedriver' executable needs to be in PATH. Please see https://sites.google.com/a/chromium.org/chromedriver/home

What I don't get is that despite downloading the chromedriver and putting it in my Path in my environment variable, it gave me the above mentioned error. Can you please help me out?

joeyism commented 6 years ago

You have to set your env variable CHROMEDRIVER to be your chromedriver.

So if my chromedriver file is in ~, I'd set it by

export CHROMEDRIVER=~/chromedriver

Ammarmajeed commented 6 years ago

So basically I named the folder which contained my chromedriver.exe CHROMEDRIVER and added the folder into my path as an variable named CHROMEDRIVER. Do I need to use the export CHROMEDRIVER=~/chromedriver command in the cmd or add this line in my python script?

joeyism commented 6 years ago

@Ammarmajeed You need to reference your .exe as your variable. So in your environment variables, create one called CHROMEDRIVER and reference the specific location of your .exe file

Ammarmajeed commented 6 years ago

Yes I did that. I basically added the directory containing chromedriver to my path as a new environment variable as seen below: linkedinscrapper

then I made a python script and ran it through my cmd and got the following error yet again: linkedinscrapper

I dont get why im getting this still even though I referenced my chromedriver.exe in my path as a new variable...

joeyism commented 6 years ago

Oooooh I see what you did. Instead of adding it to PATH environment variable, can you try creating a new environment variable called CHROMEDRIVER, and reference the location for the CHROMEDRIVER environment variable

Ammarmajeed commented 6 years ago

Should I add CHROMEDRIVER as a user variable or a system variable?

joeyism commented 6 years ago

I'm not a windows user, so I'm not quite sure what the difference is. Try either one and see if it works?

Ammarmajeed commented 6 years ago

Okay will do. Thanks :) Out of curiosity though, does this work better in windows or linux?

joeyism commented 6 years ago

I've tested it on windows and it works fine. I'm just a natural linux user so I'm more familiar with the setup

Ammarmajeed commented 6 years ago

So I made a new environment variable by the name of CHROMEDRIVER as you suggested. Now its showing another error saying: 'CHROMEDRIVER' executable may have wrong permissions. Please see https://sites.google.com/a/chromium.org/chromedriver/home

What permissions do I have to set here and how?

joeyism commented 6 years ago

Can you screenshot your environment variables, so I can see exactly what you put down?

Ammarmajeed commented 6 years ago

Sure. Here you go: envvars

joeyism commented 6 years ago

Hmmm I can't see it from that screenshot, but are you referencing the folder or the .exe file from your env variable?

Ammarmajeed commented 6 years ago

Yes Precisely ....

joeyism commented 6 years ago

If you are referencing the folder, try referencing the .exe file instead. If you are referencing the .exe file, try just referencing the folder. Does either one work?

Ammarmajeed commented 6 years ago

How can we reference the .exe file directly?. I was referencing the folder containing the file up till now. I replaced line 36 in scraper.py (driver = webdriver.Chrome(driver_path)) with:

driver = webdriver.Chrome()

And a browser tab opened up:

lol1

Which after loading showed this:

lol12

The script showed this on cmd: lol2

joeyism commented 6 years ago

Ah okay. There's 2 parts to this.

I'll fix the first part later tonight, and republish so you can run it.

The second part is a thing that linkedin does sometimes, where they force you to login. I'm not sure when they force you to login, and when they don't. If you run it a few times, it should work eventually. I'll do some testing for that portion.

Ammarmajeed commented 6 years ago

Cool. looking forward to your fix. Also apologies for being annoying. Hope youre not going through too much trouble because of me :)

joeyism commented 6 years ago

It's no trouble, I'm happy that you're using this tool :)

The first problem should be fixed from release 1.1.0. Just pip upgrade the package and you'll get it. This new publish allows you to use your own webdriver, so you can run

from selenium import webdriver

driver = webdriver.Chrome()
person = Person("http://.....", driver = driver)
joeyism commented 6 years ago

The second problem can be solved with a hack from release 1.2.0 on:

When you create the Person, set scrape to False such that

from linkedin_user_scraper.scraper import Person
person = Person("https://www.linkedin.com/in/andre-iguodala-65b48ab5", scrape = False)

Your chrome will still popup, and go to the person's page. Login with your Linkedin account, and log out. Linkedin has this new thing where they block profiles if you've never login before. (Source)

After that, you can run

person.scrape()

and it'll scrape the way you want

EDIT: If you want to scrape multiple profiles, and don't want to keep logging in for each person, you can simply reuse the driver. When running scrape(), adding close_on_complete=False prevents the browser from closing, so you might want to run

person.scrape(close_on_complete=False)
ASz-IT commented 6 years ago

Hello @joeyism, I follow your instruction and finally get running code but i also get issue with login to linkedin each time. (I using chrome browser) Maybe it should by run in some special mode? When i run this: from linkedin_user_scraper.scraper import Person person = Person("https://www.linkedin.com/in/arkadiusz-szczeciński-794177101",scrape=False)

it's opening my browser and each time i need to log in(it's don't save cookies?): image

and when I login and run : person.scrape(close_on_complete=False)

a get error below image

PS Do you know some library to get info about companies from linkedin?

joeyism commented 6 years ago

Hi @ASzz , you have to log out after logging in on Linkedin, before you run .scrape(). It's because if you're logged in and you scrape someone else's profile, it'll show up on their feed that you looked at their profile.

I don't know any that scrapes companies, but you can make a new thread with a feature request and I'll do it, or fork this project, add it, merge it in, and i'll approve it.

Ammarmajeed commented 6 years ago

@joeyism whats the pip command for windows cmd to upgrade the scrapper to the latest version?

joeyism commented 6 years ago

@Ammarmajeed

pip3 install --upgrade linkedin_user_scraper
joeyism commented 6 years ago

@Ammarmajeed does it work for you now? If it works, I'm going to close this thread.

Ammarmajeed commented 6 years ago

Hey @joeyism . So I got the latest version of your tool and tried to run it in windows 10. I wrote the following script and got the same error as seen below the script:

from linkedin_scraper import Person
from selenium import webdriver

driver = webdriver.Chrome()
person = Person("https://www.linkedin.com/in/andre-iguodala-65b48ab5", driver = driver)

untitled

joeyism commented 6 years ago

Hi @Ammarmajeed

The code throws an error at

driver = webdriver.Chrome()

which is a selenium problem.

My suggestion is to open up python as is, and try to run selenium as is, without the linkedin_scraper. You can find their python docs here.

Ammarmajeed commented 6 years ago

Hey I figured out how to add the chromedriver.exe to path. Its done by: driver = webdriver.Chrome('~\chromedriver.exe') where '~' is the directory(location) of the chromedriver.exe. But now it gives the following errror: untitled

And when I close the chromedriver program this shows on my cmd: untitled2

joeyism commented 6 years ago

Hi @Ammarmajeed , If you open up python on cmd and run

from selenium import webdriver

driver = webdriver.Chrome("~\chromedriver.exe")
driver.get("https://www.linkedin.com/in/andre-iguodala-65b48ab5")

does that throw an error of any kind?

Ammarmajeed commented 6 years ago

Yup. The same error

joeyism commented 6 years ago

Ah okay. That's still a chromedriver error, not a linkedin_scraper error. Which version of chrome are you using? This link may be a clue on how you can fix this problem

Ammarmajeed commented 6 years ago

My version of chromedriver was not right. I downloaded the latest one and then ran the script again. A browser opened up (The cmd showed the response before the red line in the picture below). I logged into linkedin and then I signed out (The cmd showed the response after the red line in the picture below). untitled

Is this also a selenium error?

joeyism commented 6 years ago

Okay, you got past the selenium error, which is good.

Try doing this, in the exact order:

  1. Run ipython or python
  2. In ipython/python, run the following code (you can modify it if you need to specify your driver)
  3. from linkedin_scraper import Person
    person = Person("https://www.linkedin.com/in/andre-iguodala-65b48ab5", driver = driver, scrape=False)
  4. Login to Linkedin
  5. Logout of Linkedin
  6. In the same ipython/python code, run
    person.scrape()

The reason is that LinkedIn has recently blocked people from viewing certain profiles without having previously signed in. So by setting scrape=False, it doesn't automatically scrape the profile, but Chrome will open the linkedin page anyways. You can login and logout, and the cookie will stay in the browser and it won't affect your profile views. Then when you run person.scrape(), it'll scrape and close the browser. If you want to keep the browser on so you can scrape others, run it as

person.scrape(close_on_complete=False)

so it doesn't close.

Ammarmajeed commented 6 years ago

untitled

Ammarmajeed commented 6 years ago

I understood and followed your steps but it's giving the following error in line 62 in person.py as shown in the above comment: NameError: name 'Experience' is not defined

joeyism commented 6 years ago

Hi @Ammarmajeed

It was a slight bug that occurred when publishing. Update to the newest version at 2.0.1 and it'll fix this problem

arkkanoid commented 6 years ago

Following this thread.. I've an error previous to login to Linkedin:

from linkedin_scraper import Person
from selenium import webdriver
driver = webdriver.Chrome()
person = Person("https://www.linkedin.com/in/andre-iguodala-65b48ab5", driver = driver, scrape=False)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/Users/jordi/Library/Python/3.6/lib/python/site-packages/linkedin_scraper/person.py", line 32, in __init__
    driver.get(linkedin_url)
  File "/Users/jordi/Library/Python/3.6/lib/python/site-packages/selenium/webdriver/remote/webdriver.py", line 324, in get
    self.execute(Command.GET, {'url': url})
  File "/Users/jordi/Library/Python/3.6/lib/python/site-packages/selenium/webdriver/remote/webdriver.py", line 312, in execute
    self.error_handler.check_response(response)
  File "/Users/jordi/Library/Python/3.6/lib/python/site-packages/selenium/webdriver/remote/errorhandler.py", line 237, in check_response
    raise exception_class(message, screen, stacktrace)
selenium.common.exceptions.WebDriverException: Message: unknown error: cannot determine loading status
from unknown error: missing or invalid 'entry.level'
  (Session info: chrome=63.0.3239.132)
  (Driver info: chromedriver=2.29.461585 (0be2cd95f834e9ee7c46bcc7cf405b483f5ae83b),platform=Mac OS X 10.13.1 x86_64)

It should wait to login?

arkkanoid commented 6 years ago

It was my fault, I'd a past version of chromedriver