rchipka / node-osmosis

Web scraper for NodeJS
4.12k stars 245 forks source link

Concurrency not in effect? #8

Closed caccialdo closed 9 years ago

caccialdo commented 9 years ago

I wrote a script that generates a long command chain like this one:

osmosis
.get('http://...').set({...}).data(fn)
.get('http://...').set({...}).data(fn)
...
.get('http://...').set({...}).data(fn)
.done(fn);

I would have expected osmosis to kick off many concurrent connections (5, by default) but when I look at the requests property of the Parser instance when I run it, the number never gets past 1 and scraping seems to happen one url at a time indeed given the slowness.

Am I missing something here or is it a genuine issue with this library?

rchipka commented 9 years ago

Chaining all the GET requests in the same Osmosis instance forces them to happen in that order one at a time. If you're giving Osmosis a bunch of URLs, then request each page with a separate Osmosis instance.

osmosis
.get('http://...').set({...}).data(fn)
osmosis
.get('http://...').set({...}).data(fn)
...
osmosis
.get('http://...').set({...}).data(fn)
rchipka commented 9 years ago

Please use the latest changes for best results. A stack/queue incrementing issue has been fixed. The issue caused separate Osmosis instances to unnecessarily increase the request queue length.

caccialdo commented 9 years ago

Thanks for the swift answer. I used your recommended pattern instead, with the latest master. Worked flawlessly.