Open tunahorse opened 1 year ago
https://www.linkedin.com/company/medtronic/people/?keywords=device%20sales
I want to grab each person's name and the link to each person's profile
Okay using your code I run into 999, meaning linkedin says stop. Using header's I get redirected to the login. Two options.
Rotate IP's. (Complicated) Use the API
`import scrapy
class PeopleScraper(scrapy.Spider): name = "people_scraper" allowed_domains = ["linkedin.com"] handle_httpstatus_all = True headers = { "User-Agent": "Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/58.0.3029.110 Safari/537.36" }
def start_requests(self):
# define the start URL
start_url = 'https://www.linkedin.com/company/medtronic/people/?keywords=device%20sales'
# replace [COMPANY] with the company name or ID in the URL
# you can also add additional parameters to the URL, such as "?keywords=[KEYWORD]"
yield scrapy.Request(url=start_url, headers=self.headers, callback=self.parse)
def parse(self, response):
# extract the HTML elements that contain the name and URL for each person
for person in response.css('.search-result__info'):
name = person.css('.ember-view.lt-line-clamp.lt-line-clamp--single-line.org-people-profile-card__profile-title.t-black::text').get()
url = person.css('a::attr(href)').get()
# clean up the data
name = name.strip() # remove extra whitespace
url = response.urljoin(url) # convert relative URL to absolute URL
# return a dictionary with the scraped data
yield {'name': name, 'url': url}
`
had a feeling linkedin was blocking. so this is a linkedin API people can use to access site?
Pls provide screenshots and extract HTML you are trying to scrape.