Closed leenash02 closed 3 years ago
Hi there! Can you share the code you are using?
Not OP but I can confirm this. The log shows the url that is used to fetch the jobs. If I open that url in incognito windows I get 33 results but scraper returns 54 results. Some jobs are included multiple times. In my case there is no next page. All 33 jobs are displayed and bottom of the page says "You've viewed all jobs for this search".
Qerying following skills c#
scraper:info Implementing LoggedOutRunStrategy. +0ms
Running scraper
scraper:info Setting chrome launch options { headless: true,
args:
[ '--enable-automation',
'--start-maximized',
'--window-size=1472,828',
'--lang=en-GB',
'--no-sandbox',
'--disable-setuid-sandbox',
'--disable-gpu',
'--disable-dev-shm-usage',
'--no-sandbox',
'--disable-setuid-sandbox',
'--disable-dev-shm-usage',
'--proxy-server=\'direct://',
'--proxy-bypass-list=*',
'--disable-accelerated-2d-canvas',
'--disable-gpu',
'--allow-running-insecure-content',
'--disable-web-security',
'--disable-client-side-phishing-detection',
'--disable-notifications',
'--mute-audio' ],
defaultViewport: null,
pipe: true,
slowMo: 50 } +1ms
scraper:info [c#][Finland] Starting new query: query="c#" location="Finland" +235ms
scraper:info [c#][Finland] Query options { locations: [ 'Finland' ],
limit: 500,
optimize: true,
filters: { relevance: 'DD', time: '1,2' } } +1ms
scraper:info [c#][Finland] Opening https://www.linkedin.com/jobs/search?keywords=c%23&location=Finland&sortBy=DD&f_TP=1%2C2&redirect=false&position=1&pageNum=0 +860ms
scraper:info [c#][Finland] Jobs fetched: 24 +3s
For testing purposes I am using the following code:
const {
events,
IData,
LinkedinScraper,
relevanceFilter,
timeFilter
} = require('linkedin-jobs-scraper');
var numberOfJobsScraped = 0;
var numberOfScrapingErrors = 0;
(async () => {
const scraper = new LinkedinScraper({
headless: true,
slowMo: 50,
});
scraper.on(events.scraper.data, (data) => {
numberOfJobsScraped++;
console.log('Got job', data.jobId);
});
scraper.on(events.scraper.error, (err) => {
console.log(err);
numberOfScrapingErrors++;
});
scraper.on(events.scraper.end, () => {
console.log('Scraping ended');
});
scraper.on(events.puppeteer.browser.targetcreated, () => {});
scraper.on(events.puppeteer.browser.targetchanged, () => {});
scraper.on(events.puppeteer.browser.targetdestroyed, () => {});
scraper.on(events.puppeteer.browser.disconnected, () => {});
console.log('Running scraper');
await scraper.run({
query: 'c#',
options: {
locations: ['Finland'],
limit: 500,
filters: {
relevance: relevanceFilter.RECENT,
time: timeFilter.WEEK
}
}
}, {
optimize: true
});
console.log('Closing browser');
await scraper.close();
console.log(`Jobs scraped: ${numberOfJobsScraped} Scraping Errors: ${numberOfScrapingErrors}`);
console.log(`Scraping tool ended: ${new Date().toISOString()}`);
})();
Below you can see the output of that run. I am logging the jobId
and from the output you can see that some jobIds are outputted multiple time e.g. 2313231277
scraper:info Implementing LoggedOutRunStrategy. +0ms
Running scraper
scraper:info Setting chrome launch options { headless: true,
args:
[ '--enable-automation',
'--start-maximized',
'--window-size=1472,828',
'--lang=en-GB',
'--no-sandbox',
'--disable-setuid-sandbox',
'--disable-gpu',
'--disable-dev-shm-usage',
'--no-sandbox',
'--disable-setuid-sandbox',
'--disable-dev-shm-usage',
'--proxy-server=\'direct://',
'--proxy-bypass-list=*',
'--disable-accelerated-2d-canvas',
'--disable-gpu',
'--allow-running-insecure-content',
'--disable-web-security',
'--disable-client-side-phishing-detection',
'--disable-notifications',
'--mute-audio' ],
defaultViewport: null,
pipe: true,
slowMo: 50 } +3ms
scraper:info [c#][Finland] Starting new query: query="c#" location="Finland" +231ms
scraper:info [c#][Finland] Query options { locations: [ 'Finland' ],
limit: 500,
optimize: true,
filters: { relevance: 'DD', time: '1,2' } } +0ms
scraper:info [c#][Finland] Opening https://www.linkedin.com/jobs/search?keywords=c%23&location=Finland&sortBy=DD&f_TP=1%2C2&redirect=false&position=1&pageNum=0 +856ms
scraper:info [c#][Finland] Jobs fetched: 25 +3s
Got job 2339560193
scraper:info [c#][Finland][1] Processed +344ms
Got job 2313231277
scraper:info [c#][Finland][2] Processed +913ms
Got job 2352140648
scraper:info [c#][Finland][3] Processed +810ms
Got job 2332637692
scraper:info [c#][Finland][4] Processed +682ms
Got job 2337486044
scraper:info [c#][Finland][5] Processed +671ms
Got job 2331660739
scraper:info [c#][Finland][6] Processed +918ms
Got job 2350231453
scraper:info [c#][Finland][7] Processed +670ms
Got job 2298699594
scraper:info [c#][Finland][8] Processed +781ms
Got job 2322976854
scraper:info [c#][Finland][9] Processed +811ms
Got job 2312805498
scraper:info [c#][Finland][10] Processed +796ms
Got job 2349858398
scraper:info [c#][Finland][11] Processed +813ms
Got job 2324012574
scraper:info [c#][Finland][12] Processed +1s
Got job 2348331907
scraper:info [c#][Finland][13] Processed +811ms
Got job 2332200182
scraper:info [c#][Finland][14] Processed +668ms
Got job 2329224561
scraper:info [c#][Finland][15] Processed +1s
Got job 2346743599
scraper:info [c#][Finland][16] Processed +779ms
scraper:error [c#][Finland][17] Timeout on loading job details +0ms
[c#][Finland][17] Timeout on loading job details
Got job 2330867077
scraper:info [c#][Finland][17] Processed +6s
Got job 2345736108
scraper:info [c#][Finland][18] Processed +681ms
Got job 2328465421
scraper:info [c#][Finland][19] Processed +794ms
Got job 2345329129
scraper:info [c#][Finland][20] Processed +684ms
Got job 2328438853
scraper:info [c#][Finland][21] Processed +794ms
Got job 2328427743
scraper:info [c#][Finland][22] Processed +806ms
Got job 2344618161
scraper:info [c#][Finland][23] Processed +669ms
Got job 2326262469
scraper:info [c#][Finland][24] Processed +935ms
scraper:info [c#][Finland][24] Fecthing new jobs +0ms
scraper:info [c#][Finland][24] Checking for new jobs to load... +62ms
scraper:info [c#][Finland][24] Jobs fetched: 31 +1s
Got job 2339560193
scraper:info [c#][Finland][25] Processed +342ms
Got job 2313231277
scraper:info [c#][Finland][26] Processed +313ms
Got job 2352140648
scraper:info [c#][Finland][27] Processed +326ms
Got job 2332637692
scraper:info [c#][Finland][28] Processed +311ms
Got job 2337486044
scraper:info [c#][Finland][29] Processed +314ms
Got job 2331660739
scraper:info [c#][Finland][30] Processed +311ms
Got job 2350231453
scraper:info [c#][Finland][31] Processed +313ms
Got job 2298699594
scraper:info [c#][Finland][32] Processed +312ms
Got job 2322976854
scraper:info [c#][Finland][33] Processed +313ms
Got job 2312805498
scraper:info [c#][Finland][34] Processed +325ms
Got job 2349858398
scraper:info [c#][Finland][35] Processed +311ms
Got job 2324012574
scraper:info [c#][Finland][36] Processed +345ms
Got job 2348331907
scraper:info [c#][Finland][37] Processed +328ms
Got job 2332200182
scraper:info [c#][Finland][38] Processed +326ms
Got job 2329224561
scraper:info [c#][Finland][39] Processed +343ms
Got job 2346743599
scraper:info [c#][Finland][40] Processed +326ms
scraper:error [c#][Finland][41] Timeout on loading job details +18s
[c#][Finland][41] Timeout on loading job details
Got job 2330867077
scraper:info [c#][Finland][41] Processed +5s
Got job 2345736108
scraper:info [c#][Finland][42] Processed +313ms
Got job 2328465421
scraper:info [c#][Finland][43] Processed +312ms
Got job 2345329129
scraper:info [c#][Finland][44] Processed +312ms
Got job 2328438853
scraper:info [c#][Finland][45] Processed +314ms
Got job 2328427743
scraper:info [c#][Finland][46] Processed +325ms
Got job 2344618161
scraper:info [c#][Finland][47] Processed +311ms
Got job 2326262469
scraper:info [c#][Finland][48] Processed +312ms
Got job 2326254625
scraper:info [c#][Finland][49] Processed +813ms
Got job 2326247738
scraper:info [c#][Finland][50] Processed +792ms
Got job 2348698389
scraper:info [c#][Finland][51] Processed +683ms
Got job 2344399784
scraper:info [c#][Finland][52] Processed +684ms
Got job 2326237530
scraper:info [c#][Finland][53] Processed +809ms
Got job 2344284662
scraper:info [c#][Finland][54] Processed +806ms
scraper:info [c#][Finland][54] Fecthing new jobs +0ms
scraper:info [c#][Finland][54] Checking for new jobs to load... +63ms
scraper:info [c#][Finland][54] There are no more jobs available for the current query +3s
Scraping ended
Closing browser
Jobs scraped: 54 Scraping Errors: 2
Scraping tool ended: 2020-12-21T10:29:50.139Z
Hi, thanks for sharing the code! I have found a bug in the jobs loop, could you retry with the latest version and see if this solves your issue?
The latest version works. I didn't get any duplicates. Thank you!
Hey spinlud, thanks for the scraper, works perfectly! I passed an empty string as query and managed to get 3000 results, however turns out 2700ish of those are duplicants, the actual data I was able to obtain was up to 300 unique job postings, seems like the scraper was not able to bypass some paging mechanic in LinkedIn and looped over what it could reach. The data it obtained is excellent though, and I would love to utilize it to get more data. Any ideas? Cheers!