Closed gabrielflorit closed 10 years ago
I'm sure you can combine these promises in an elegant way, but I'd use async
as usual.
Untested and lacks proper error handling (I just randomly came across this repo)
var async = require('async');
var scraperjs = require('scraperjs');
var StaticScraper = scraperjs.StaticScraper;
var urls = [
['https://news.ycombinator.com/', function($) {
return $('a').map(function() {
return $(this).text();
}).get();
}],
['https://www.google.com/', function($) {
return $('input').map(function() {
return $(this).val();
}).get();
}]
];
async.mapSeries(urls, function(url, callback) {
StaticScraper.create(url[0]).scrape(url[1], function(content) {
callback(null, content);
})
}, function(err, contents) {
console.log(contents);
});
You could also use mapLimit
to do this in parallel in a deterministic matter (I wouldn't use map
with an unknown number of urls).
Use a router, then, just iterate through a list of URL calling the route
method (you can use asyc to make it synchronous). You can then use otherwise
to store the URLs without a path for later use.
Hi there,
How would I go about scraping a list of urls? I'm a bit stuck.
Thanks,
Gabriel