Closed alvinjchoi closed 6 years ago
To give you an idea of what I'm doing, I have around 400 linkedin URLs and want to scrape their name, location, current company, current position, etc.
There's nothing that comes out of the box. If you don't mind saving it as json, you could nest in experiences
and educations
, get the dict of those, and combine that with the dict of the person.
For example
d = person.__dict__.copy()
del d["driver"]
d["experiences"] = [ experience.__dict__ for experience in person.experiences]
d["educations"] = [ education.__dict__ for education in person.educations]
import json; json.dump(d, open("filename.json", "w+"))
I don't know what the structure would look like in CSV, as the data is nested and the number of experiences
and educations
may vary.
So.. I'm wondering if there's a way to parse through multiple urls in the linkedin_urls.txt and keep scraping. Ignore the sleep timer, the numbers are very random and was testing a few cases. I'm a very beginner in python, so it's a bit challenging. I imported time method thinking maybe I'll need some time to give it to load the page.
But here's an error that I'm getting:
Your error came about because you haven't logged in, and new Linkedin policy requires you to login for a lot of profiles.
You can login first, then loop through your file without closing browser. That way, you don't have to re-login every time
from selenium import webdriver
from linkedin_scraper import Person
driver = webdriver.Chrome()
driver.get("http://www.linkedin.com")
# put a breaker here via input or something
# you must login here
all_users = {}
fp = open("linkedin_url.text", "r")
for line in fp.readlines():
person = Person(line, driver = driver, scrape=False)
person.scrape(close_on_complete=False)
d = person.__dict__.copy()
del d["driver"]
d["experiences"] = [ experience.__dict__ for experience in person.experiences]
d["educations"] = [ education.__dict__ for education in person.educations]
all_users[person.name] = d #saves it all to one giant dict
import json; json.dump(all_users, open("filename.json", "w+"))
As a side note, it'd be a bit easier if you pasted your code in, instead of taking a screenshot.
Hmm, I'm actually logged in but not sure why it's not working.
from linkedin_scraper import Person
from selenium import webdriver
driver = webdriver.Chrome()
driver.get("https://www.linkedin.com")
# I'm logging into Linkedin here, then pressing enter afterwards, and the code continues
# Currently, after opening 'fp', it opens the profile that corresponds to the first line of the text file,
# which is exactly what I want.
input("Press Enter to continue...")
all_users = {}
fp = open ("linkedin_url.txt", "r")
for line in fp.readlines():
person = Person(line, driver = driver, scrape = False)
# I added this statement below to identify exactly where my error is coming from. Even with linkedin logged in,
# and with the profile that I want, when I press enter here to scrape, it throws an error,
# same as the last time.
input("Press Enter to scrape...")
person.scrape(close_on_complete=False)
d = person.__dict__.copy()
del d["driver"]
d["experiences"] = [ experience.__dict__ for experience in person.experiences]
d["educations"] = [ education.__dict__ for education in person.educations]
all_users[person.name] = d
import json; json.dump(all_users, open("alumni.json", "w+"))
Also, my apologies for the screenshot. You are the best, this is helping me so much!!
I updated it with a bugfix, and it's published in 2.1.1
. Update your linkedin_scraper
and try it again
Following all the above steps, this is what results.
Here's what I'm doing... driver - webdriver.Chrome() person = Person("http://www.linkedin.com/in/randomperson", driver = driver, scrape=False) f = open('scrape.txt', 'w') f.write(person.scrape())
But I'm getting an error.. TypeError: write() argument must be str, not None How do I convert person object into string and append into text file, or better yet, csv file?