Closed NLCas8 closed 3 years ago
Hi there! Can you share the code of the query?
This is the one of them I was trying:
query = Query(
query='Data analist',
options=QueryOptions(
locations=['Nederland'],
optimize=True,
limit=10000,
filters=QueryFilters(
company_jobs_url=None,
relevance=RelevanceFilters.RECENT,
time=TimeFilters.ANY,
type=TypeFilters.FULL_TIME,
experience=ExperienceLevelFilters.MID_SENIOR,
)
)
)
It always seems to stop after 56 results. If I run the example query, interestingly it does retrieve more results than 56.
With that settings I got 186 jobs:
Can you trying again removing all filters?
I retried with the following query:
query = Query(
query='Data analist',
options=QueryOptions(
locations=['Nederland'],
optimize=True,
limit=10000,
)
)
Final output:
INFO:li:scraper:('[Data analist][Nederland][55]', 'Processed')
INFO:li:scraper:('[Data analist][Nederland][56]', 'Processed')
INFO:li:scraper:('[Data analist][Nederland][56]', 'Pagination requested (9)')
INFO:li:scraper:('[Data analist][Nederland][56]', "Couldn't find more jobs for the running query")
I am not sure what is causing this.
It seems to me there are problems when scrolling/loading more jobs on Linkedin website using an anonymous session (logged out). I mean, I've tried to open this url on Chrome in incognito and normal mode, both on Mac and Windows and got the same result: at some point, while scrolling jobs, pagination stops working.
This is normal browser navigation, it doesn't have anything to do with this library. To double check it is not an ip-related issue I have also tried to connect to my smartphone hotspot (to get a different ip), but faced the same problem. Honestly I don't think there is nothing I could do here, it seems a Linkedin issue to me. What you can try is to use an authenticated session, as described here and see if it helps!
Opening the url in incognito mode I indeed see the same happening, where the pagination stops working suddenly. If I open the url while logged in I can scroll down and am able to click on one of the next pages.
Actually, I was already trying with an authenticated session, so that did not make a difference unfortunately. It seems as if it behaves like it was an anonymous session somehow, even though it does say it is using the AuthenticatedStrategy:
INFO:li:scraper:('Using strategy AuthenticatedStrategy',) INFO:li:scraper:('[Data scientist][Netherlands]', 'Setting authentication cookie') INFO:li:scraper:('[Data scientist][Netherlands]', 'Session is valid')
I found a possible bug with pagination when using authenticated session. Could you retry with the latest version and see if it helps?
It's fixed now, you're an absolute legend! :D
FYI @NLCas8
It seems that pagination issues in anonymous mode are caused by Linkedin not being compliant to Chrome CSP (Cross Security Policy). Honestly I don't know when (and if) they are going to fix this, but I've found a workaround to force CSP bypass using Chrome Developer Tools protocol.
Long story short, pagination should work properly again even in anonymous mode 🎵
PS: you can try it yourself enabling this chrome extension on Linkedin page
I just gave it a shot in anonymous mode and it's working indeed, it's retrieving over 50 results now. Pretty neat!
Also, if other people run into this thread, I found that setting slow_mo
to 5 would give me the best experience. Any faster and it will give you the 'Too many requests' error at some point. 5 seems to be perfect, where you can keep it running as long as you wish without errors :)
Hi,
First of all, thank you for this great tool @spinlud!
When I run a query the scraper never gets past page 9 (about 50-60 individual results) somehow, see below:
INFO:li:scraper:('[Data analist][Nederland][56]', 'Pagination requested (9)') INFO:li:scraper:('[Data analist][Nederland][56]', "Couldn't find more jobs for the running query")
If I were to run the exact query using LinkedIn itself it usually finds more than 5000 results. I have tried changing the parameters, like setting TIME=ANY, LIMIT=10000, and different combinations, but with no luck yet.
Is this a bug, a limitation of the LinkedIn API, or perhaps am I doing something wrong?
Thank you for your help!