Closed shanmugamgsn closed 6 years ago
Ironically, I'm able to run individual files. Not able to do pip
huh, that's strange.
Do you have admin access on your console when you're pip installing? Maybe run
pip install --user linked_user_scraper
instead
I tried this and got the following error.
Collecting linked_user_scraper Could not find a version that satisfies the requirement linked_user_scraper (from versions: ) No matching distribution found for linked_user_scraper
I'm a beginner, ,so I'm finding it difficult :(
Which python version are you using?
You can find out with python -V
It's Python 3.6.3
Try installing it from a git module, with
pip install git+https://github.com/joeyism/linkedin_user_scraper.git
@shanmugamgsn does it work for you now?
Hey I'm trying to install your Linkedin_user_scrapper by the following command as mentioned on your repo: pip3 install --user linkedin_user_scrapper
. But I get this error: Collecting linkedin_user_scrapper Could not find a version that satisfies the requirement linkedin_user_scrapper (from versions: ) No matching distribution found for linkedin_user_scrapper
@Ammarmajeed I'll fix it when I get home tonight. For now, you can install via
pip install git+https://github.com/joeyism/linkedin_user_scraper.git
Thanks
@Ammarmajeed the bug was that I misspelled scraper
in the README. Try now with
pip3 install --user linkedin_user_scraper
It works. I have another issue now. So I made a test.py file with the following code in it:
from linkedin_user_scraper.scraper import Person
person = Person("https://www.linkedin.com/in/andre-iguodala-65b48ab5")
And it gives me this error:
os.path.basename(self.path), self.start_error_message)
selenium.common.exceptions.WebDriverException: Message: 'chromedriver' executable needs to be in PATH. Please see https://sites.google.com/a/chromium.org/chromedriver/home
What I don't get is that despite downloading the chromedriver and putting it in my Path in my environment variable, it gave me the above mentioned error. Can you please help me out?
You have to set your env variable CHROMEDRIVER to be your chromedriver.
So if my chromedriver file is in ~, I'd set it by
export CHROMEDRIVER=~/chromedriver
So basically I named the folder which contained my chromedriver.exe CHROMEDRIVER and added the folder into my path as an variable named CHROMEDRIVER. Do I need to use the export CHROMEDRIVER=~/chromedriver command in the cmd or add this line in my python script?
@Ammarmajeed You need to reference your .exe as your variable. So in your environment variables, create one called CHROMEDRIVER and reference the specific location of your .exe file
Yes I did that. I basically added the directory containing chromedriver to my path as a new environment variable as seen below:
then I made a python script and ran it through my cmd and got the following error yet again:
I dont get why im getting this still even though I referenced my chromedriver.exe in my path as a new variable...
Oooooh I see what you did. Instead of adding it to PATH environment variable, can you try creating a new environment variable called CHROMEDRIVER, and reference the location for the CHROMEDRIVER environment variable
Should I add CHROMEDRIVER as a user variable or a system variable?
I'm not a windows user, so I'm not quite sure what the difference is. Try either one and see if it works?
Okay will do. Thanks :) Out of curiosity though, does this work better in windows or linux?
I've tested it on windows and it works fine. I'm just a natural linux user so I'm more familiar with the setup
So I made a new environment variable by the name of CHROMEDRIVER as you suggested. Now its showing another error saying: 'CHROMEDRIVER' executable may have wrong permissions. Please see https://sites.google.com/a/chromium.org/chromedriver/home
What permissions do I have to set here and how?
Can you screenshot your environment variables, so I can see exactly what you put down?
Sure. Here you go:
Hmmm I can't see it from that screenshot, but are you referencing the folder or the .exe file from your env variable?
Yes Precisely ....
If you are referencing the folder, try referencing the .exe file instead. If you are referencing the .exe file, try just referencing the folder. Does either one work?
How can we reference the .exe file directly?. I was referencing the folder containing the file up till now. I replaced line 36 in scraper.py (driver = webdriver.Chrome(driver_path)
) with:
driver = webdriver.Chrome()
And a browser tab opened up:
Which after loading showed this:
The script showed this on cmd:
Ah okay. There's 2 parts to this.
I'll fix the first part later tonight, and republish so you can run it.
The second part is a thing that linkedin does sometimes, where they force you to login. I'm not sure when they force you to login, and when they don't. If you run it a few times, it should work eventually. I'll do some testing for that portion.
Cool. looking forward to your fix. Also apologies for being annoying. Hope youre not going through too much trouble because of me :)
It's no trouble, I'm happy that you're using this tool :)
The first problem should be fixed from release 1.1.0. Just pip upgrade the package and you'll get it. This new publish allows you to use your own webdriver, so you can run
from selenium import webdriver
driver = webdriver.Chrome()
person = Person("http://.....", driver = driver)
The second problem can be solved with a hack from release 1.2.0 on:
When you create the Person, set scrape to False such that
from linkedin_user_scraper.scraper import Person
person = Person("https://www.linkedin.com/in/andre-iguodala-65b48ab5", scrape = False)
Your chrome will still popup, and go to the person's page. Login with your Linkedin account, and log out. Linkedin has this new thing where they block profiles if you've never login before. (Source)
After that, you can run
person.scrape()
and it'll scrape the way you want
EDIT:
If you want to scrape multiple profiles, and don't want to keep logging in for each person, you can simply reuse the driver. When running scrape()
, adding close_on_complete=False
prevents the browser from closing, so you might want to run
person.scrape(close_on_complete=False)
Hello @joeyism,
I follow your instruction and finally get running code but i also get issue with login to linkedin each time. (I using chrome browser) Maybe it should by run in some special mode?
When i run this:
from linkedin_user_scraper.scraper import Person person = Person("https://www.linkedin.com/in/arkadiusz-szczeciński-794177101",scrape=False)
it's opening my browser and each time i need to log in(it's don't save cookies?):
and when I login and run :
person.scrape(close_on_complete=False)
a get error below
PS Do you know some library to get info about companies from linkedin?
Hi @ASzz ,
you have to log out after logging in on Linkedin, before you run .scrape()
. It's because if you're logged in and you scrape someone else's profile, it'll show up on their feed that you looked at their profile.
I don't know any that scrapes companies, but you can make a new thread with a feature request and I'll do it, or fork this project, add it, merge it in, and i'll approve it.
@joeyism whats the pip command for windows cmd to upgrade the scrapper to the latest version?
@Ammarmajeed
pip3 install --upgrade linkedin_user_scraper
@Ammarmajeed does it work for you now? If it works, I'm going to close this thread.
Hey @joeyism . So I got the latest version of your tool and tried to run it in windows 10. I wrote the following script and got the same error as seen below the script:
from linkedin_scraper import Person
from selenium import webdriver
driver = webdriver.Chrome()
person = Person("https://www.linkedin.com/in/andre-iguodala-65b48ab5", driver = driver)
Hi @Ammarmajeed
The code throws an error at
driver = webdriver.Chrome()
which is a selenium problem.
My suggestion is to open up python as is, and try to run selenium as is, without the linkedin_scraper
. You can find their python docs here.
Hey I figured out how to add the chromedriver.exe to path. Its done by:
driver = webdriver.Chrome('~\chromedriver.exe')
where '~' is the directory(location) of the chromedriver.exe.
But now it gives the following errror:
And when I close the chromedriver program this shows on my cmd:
Hi @Ammarmajeed , If you open up python on cmd and run
from selenium import webdriver
driver = webdriver.Chrome("~\chromedriver.exe")
driver.get("https://www.linkedin.com/in/andre-iguodala-65b48ab5")
does that throw an error of any kind?
Yup. The same error
Ah okay. That's still a chromedriver error, not a linkedin_scraper error. Which version of chrome are you using? This link may be a clue on how you can fix this problem
My version of chromedriver was not right. I downloaded the latest one and then ran the script again. A browser opened up (The cmd showed the response before the red line in the picture below). I logged into linkedin and then I signed out (The cmd showed the response after the red line in the picture below).
Is this also a selenium error?
Okay, you got past the selenium error, which is good.
Try doing this, in the exact order:
ipython
or python
ipython
/python
, run the following code (you can modify it if you need to specify your driver)from linkedin_scraper import Person
person = Person("https://www.linkedin.com/in/andre-iguodala-65b48ab5", driver = driver, scrape=False)
ipython
/python
code, run
person.scrape()
The reason is that LinkedIn has recently blocked people from viewing certain profiles without having previously signed in. So by setting scrape=False
, it doesn't automatically scrape the profile, but Chrome will open the linkedin page anyways. You can login and logout, and the cookie will stay in the browser and it won't affect your profile views. Then when you run person.scrape()
, it'll scrape and close the browser. If you want to keep the browser on so you can scrape others, run it as
person.scrape(close_on_complete=False)
so it doesn't close.
I understood and followed your steps but it's giving the following error in line 62 in person.py as shown in the above comment:
NameError: name 'Experience' is not defined
Hi @Ammarmajeed
It was a slight bug that occurred when publishing. Update to the newest version at 2.0.1
and it'll fix this problem
Following this thread.. I've an error previous to login to Linkedin:
from linkedin_scraper import Person
from selenium import webdriver
driver = webdriver.Chrome()
person = Person("https://www.linkedin.com/in/andre-iguodala-65b48ab5", driver = driver, scrape=False)
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/Users/jordi/Library/Python/3.6/lib/python/site-packages/linkedin_scraper/person.py", line 32, in __init__
driver.get(linkedin_url)
File "/Users/jordi/Library/Python/3.6/lib/python/site-packages/selenium/webdriver/remote/webdriver.py", line 324, in get
self.execute(Command.GET, {'url': url})
File "/Users/jordi/Library/Python/3.6/lib/python/site-packages/selenium/webdriver/remote/webdriver.py", line 312, in execute
self.error_handler.check_response(response)
File "/Users/jordi/Library/Python/3.6/lib/python/site-packages/selenium/webdriver/remote/errorhandler.py", line 237, in check_response
raise exception_class(message, screen, stacktrace)
selenium.common.exceptions.WebDriverException: Message: unknown error: cannot determine loading status
from unknown error: missing or invalid 'entry.level'
(Session info: chrome=63.0.3239.132)
(Driver info: chromedriver=2.29.461585 (0be2cd95f834e9ee7c46bcc7cf405b483f5ae83b),platform=Mac OS X 10.13.1 x86_64)
It should wait to login?
It was my fault, I'd a past version of chromedriver
I tried to Install Linkedin_user_scrapper I'm facing following error.