ruipgil / scraperjs

A complete and versatile web scraper.
MIT License
3.71k stars 188 forks source link

Doesn't install phantomjs dependency (on DynamicScraper only) #37

Closed kengz closed 9 years ago

kengz commented 9 years ago

It's a similar issue as this. Tried doing npm install phantomjs and then using scraperjs, here's what happened:

var scraperjs = require('scraperjs');

// Static: runs just fine
scraperjs.StaticScraper.create('https://news.ycombinator.com/')
.scrape(function($) {
 return $(".title a").map(function() {
     return $(this).text();
 }).get();
}, function(res) {
 console.log(res);
})

// Dynamic: complaints about "phantomjs-node: You don't have 'phantomjs' installed"
scraperjs.DynamicScraper.create('https://news.ycombinator.com/')
.scrape(function($) {
 return $(".title a").map(function() {
     return $(this).text();
 }).get();
}, function(res) {
 console.log(res);
})
vdraceil commented 9 years ago

Your phantomjs installation is local. Can you install it globally and then try? npm install -g phantomjs

kengz commented 9 years ago

Hi, I just did that too. Still the same issue. What is a proper way to install phantomjs? I've just read that it's not a proper Node module.

kengz commented 9 years ago

Alright solved it. It wasn't made clear how should phantomJS be installed. I wrote a little troubleshoot-doc for my own project:

PhantomJS is a dependency used the scraperjs web-scraper. It is not a Node module, thus its binaries must be downloaded from its site, and the path be exported using PATH on the terminal. Open up your bash profile with nano

nano ~/.bash_profile

and add the path for PhantomJS to it.

### for phantomjs
export PATH=/usr/local/phantomjs/bin:$PATH

If you like to code and run from the Sublime console, modify its Node build system to use shell_cmd instead of cmd, so that it runs the shell terminal from the console. Here the complete Node.sublime-build file:

{
    "shell_cmd": "node \"${file}\"",
    "selector": "source.js",
    "env": {
        "PATH":"/usr/local/phantomjs/bin"
    }
}
vdraceil commented 9 years ago

Yes, PhantomJS & CasperJS are not node modules, but they are installable using npm (I installed it in my machine with just npm). npm is a package manager and is not limited to installing just node modules, if I'm not wrong.

Anyways, I'm glad that your issue is solved.