x-ray sometimes exhibits behavior where it will continue paginating after the set limit. My current project pulls product links from an index page on our website, then paginates for more. In this case, it should return 50 links (25 per page), and then call a callback when using the following syntax:
x(startURL,scope,targets)(cb)
.paginate(paginate)
.limit(limit)
.write(writeToFile);
function cb(e,d){
console.log('crawl ended',{e:e,d:d.length});
if(e){rej(e)}
else if(!d){rej(new Error('no data from' +url))}
else{
res(d)
}
}
usually the callback is called, and the scraper delivers its data gracefully, even though I can see that an additional page has been scraped (I am using a filter that outputs the current url to the console) after the prescribed two pages. Sometimes, however, I get the following error :
{ error:
Error: write after end
at writeAfterEnd (_stream_writable.js:193:12)
at WriteStream.Writable.write (_stream_writable.js:240:5)
at WriteStream.Writable.end (_stream_writable.js:477:10)
at _stream_array (D:\Dropbox\Dropbox\Apps\new scrape\node_modules\x-ray\lib\
stream.js:26:16)
at next (D:\Dropbox\Dropbox\Apps\new scrape\node_modules\x-ray\index.js:112:
13)
at D:\Dropbox\Dropbox\Apps\new scrape\node_modules\x-ray\index.js:243:7
at D:\Dropbox\Dropbox\Apps\new scrape\node_modules\x-ray\lib\walk.js:56:12
at callback (D:\Dropbox\Dropbox\Apps\new scrape\node_modules\batch\index.js:
147:12)
at D:\Dropbox\Dropbox\Apps\new scrape\node_modules\x-ray\lib\walk.js:49:9
at D:\Dropbox\Dropbox\Apps\new scrape\node_modules\x-ray\index.js:232:24,
data: undefined } },
What does the write after end error signify, and how can I change my scraper syntax to avoid this problem?
Describe your issue here.
Your environment
version of node: run node --version
6.10.3
version of npm: run npm --version
5.3.0
Expected behaviour
scraper should return the data from 2 pages
Actual behaviour
returns data from 2 pages, but console logs 3 pages worth of data
or
throws 'write after end' error and does not deliver data.
Subject of the issue
x-ray sometimes exhibits behavior where it will continue paginating after the set limit. My current project pulls product links from an index page on our website, then paginates for more. In this case, it should return 50 links (25 per page), and then call a callback when using the following syntax:
usually the callback is called, and the scraper delivers its data gracefully, even though I can see that an additional page has been scraped (I am using a filter that outputs the current url to the console) after the prescribed two pages. Sometimes, however, I get the following error :
What does the write after end error signify, and how can I change my scraper syntax to avoid this problem? Describe your issue here.
Your environment
Expected behaviour
scraper should return the data from 2 pages
Actual behaviour
returns data from 2 pages, but console logs 3 pages worth of data or throws 'write after end' error and does not deliver data.