nrabinowitz / pjscrape

A web-scraping framework written in Javascript, using PhantomJS and jQuery
http://nrabinowitz.github.io/pjscrape/
MIT License
996 stars 159 forks source link

Package for NPM #50

Open RafaelVidaurre opened 10 years ago

RafaelVidaurre commented 10 years ago

Wouldn't it be good to package this to npm?

nrabinowitz commented 10 years ago

I'm not sure it would make sense, since it's not Node-based (it runs in the PhantomJS environment, wholly separate from Node). So at most npm would just be installing an executable.

RafaelVidaurre commented 10 years ago

Is there any way to actually run phantom from nodejs btw?

El sábado, 31 de mayo de 2014, Nick Rabinowitz notifications@github.com escribió:

I'm not sure it would make sense, since it's not Node-based (it runs in the PhantomJS environment, wholly separate from Node). So at most npm would just be installing an executable.

— Reply to this email directly or view it on GitHub https://github.com/nrabinowitz/pjscrape/issues/50#issuecomment-44718033.

Rafael Vidaurre S. Full stack web developer & entrepreneur. CTO at Finciero.com Chile: +569-8-409-7131

andfaulkner commented 9 years ago

You can run Phantom from Gulp, which uses NodeJS: https://www.npmjs.com/package/gulp-phantom I would really love to use pjscrape from Gulp as well as PhantomJS, and packaging this to npm would make that significantly easier. (I may issue a pull request for that if I end up doing it)

And you can make a phantom server that runs in nodeJS: https://www.npmjs.com/package/phantomjs-server --intended to replace a Selenium server.

Also, phantomjs itself does have an npm package: https://www.npmjs.com/package/phantomjs ...but it specifically warns that it doesn't run on nodeJS per se. However, the package can be used to "write standalone Phantom scripts driven from within a node program by spawning phantom in a child process." So it wouldn't run on Node, but be launched by Node, which could in effect be used nearly the same way for certain things.

Although phantomJS has no CommonJS loader, you could probably use something like WebPack to get around this, especially if an npm module were specifically written to work with phantomJS as such.

I think it could make sense to package this as an npm module, but it'd need to be modified it to ensure it can interface with the Node server-spawned PhantomJS process 'out of the box'.

shaunc commented 8 years ago

I would love to use this as a node package. I want to scrape some pages and pipeline it on to the rest of my code for further processing (save in db, websockets updates to clients, etc). Starting phantomjs in a child process would be fine. I could see this supported "natively" but it might also work via a wrapper around this project.

To support "wrapped mode", how should the child process communicate? Does it write (only) json to stdout and all errors to stderr?