spinlud / py-linkedin-jobs-scraper

MIT License
307 stars 84 forks source link

Direct url to scrape + paging (because location is not correctly set by LI) #41

Open pogarek opened 1 year ago

pogarek commented 1 year ago

I'm seeing incorrect scraping. When I set "Headless = False" I can see, that URL contains requested location , but the one visible in Jobs Search is getting set to one of last searches used on website..

So potential workaround is to provide geoid instead of location and/or have the ability to provide url to scrape , which can be used , with pagination.

spinlud commented 1 year ago

Hi, please can you share an example to reproduce?

pogarek commented 1 year ago

I search for the same keywords for 3 locations, something like (keywordA OR keywordB OR keywordC OR keywordD) , countryA , last day (keywordA OR keywordB OR keywordC OR keywordD) , region (like EMEA), last day (keywordA OR keywordB OR keywordC OR keywordD) , Worldwide, last day, Remote

When I disable headless (to see what is displayed in the browser), I can see that queries 2 and 3 are , still, set to Poland , even that URL is correct. So it looks like LinkedIn translates the location incorrectly .

So as workaround entering a copy of URL with expected job search result could be helpfull.

spinlud commented 1 year ago

Please post an example to reproduce (with code). If I remember correctly if you specify a place which isn't recognized by Linkedin it is replaced with the last correct location used (probably from a cookie) or Worldwide