johntitus / node-horseman

Run PhantomJS from Node
MIT License
1.45k stars 124 forks source link

Phantom crashes on Ubuntu 14.04 under Horseman #73

Open mjp0 opened 9 years ago

mjp0 commented 9 years ago

This is going to be hard one to debug but let's try.

I've a super simple script like this which is simply suppose to record what redirects happen:

horseman
    .on('urlChanged', function(url){
        console.log(url)
        urls.push(url)
    })
   .open('http://main.exoclick.com/click.php?data=IHw5MTEwNjJ8fGh0dHAlM0ElMkYlMkZhZC5kbW0uY29tJTJGYWQlMkZwJTJGciUzRl9zaXRlJTNEMzQ4MiUyNl9hcnRpY2xlJTNENTM2MSUyNl9saW5rJTNEMTA1ODEyJTI2X2ltYWdlJTNEMTA1ODkwJTI2c3VpZCUzRCU3QmNvbnZlcnNpb25zX3RyYWNraW5nJTdEJTI2c2FkJTNEJTdCc2l0ZV9pZCU3RHx8fHx8MTQ0NjU1Mzc2NHxwb3JuZHJlYW1lci5jb218MTA3LjE1MC41My43NHx8MTExNTE3NTB8MTU2MzUzNHw1MDh8N3w0MXwzfDE2fDB8MHx8MzAweDI1MHwxfDB8MTAyNHg3Njh8NTYzOGE4OWQyOGFmMjMuMzk0OTM5ODc2MTY0ODk5NTZ8NDkyfGY0MGJhYjZhZThjZmEyMjQxMzAwNWRkYzEwMjU0NmIxfDB8Mnxwb3JuZHJlYW1lci5jb218T0t8YjVmODFmNzhjNzg2MTBkNmEzMmM4NDgwNDU2NjNmMDA%3D')
   .waitForNextPage()
   .close();

and it ends with PhantomJS crashing with the standard error phantom stderr: PhantomJS has crashed. Please read the crash reporting guide.... It does finish all redirects as far as I can see and then crashes.

I've tried with PhantomJS 2.0 and the latest 2.0.1-dev version, same result.

WARNING: That exoclick.com link leads to an adult site and unfortunately it's the only URL I've at hand where this happens but I know there's a lot more.

Crazy thing is that this happens only with Horseman, and only with Ubuntu 14.04. With OSX it works fine which led me to believe this is PhantomJS issue until I wrote PhantomJS script that does the same thing however it works without crashing on both systems. Also CasperJS crashes with Linux.

UPDATE: I just ran the test script again trying various "fixes" I could imagine and I started getting Request() error evaluating exit() call: Error: connect ECONNREFUSED after PhantomJS has crashed....

UPDATE 2: It appears this is related to the redirects somehow because if I remove waitForNextPage() script exits normally.

johntitus commented 9 years ago

Hrm. I'm on Linux Mint 17.1, which is built on Ubuntu 14.04 and it seems to run ok. Does it always error out, or just intermittently?

Can you try running it with debug on, and then posting the output?

https://github.com/johntitus/node-horseman#debug

mjp0 commented 9 years ago

I get the error every time. Here's the debug output. I did this already earlier but at least to my eyes there's nothing weird there. However I just noticed that if you look at console.log output (I marked them with !!! prefix), urlChanged event is trigged for r18.com right before it crashes but it doesn't seem to open it according to debug log.

   horseman .setup() creating phantom instance on 12406 +0ms
  horseman phantom created. +141ms
  horseman page created +21ms
  horseman .on urlChanged set. +2ms
  horseman .userAgent() set +7ms
  horseman .open http://main.exoclick.com/click.php?data=IHw5MTEwNjJ8fGh0dHAlM0ElMkYlMkZhZC5kbW0uY29tJTJGYWQlMkZwJTJGciUzRl9zaXRlJTNEMzQ4MiUyNl9hcnRpY2xlJTNENTM2MSUyNl9saW5rJTNEMTA1ODEyJTI2X2ltYWdlJTNEMTA1ODkwJTI2c3VpZCUzRCU3QmNvbnZlcnNpb25zX3RyYWNraW5nJTdEJTI2c2FkJTNEJTdCc2l0ZV9pZCU3RHx8fHx8MTQ0NjU1Mzc2NHxwb3JuZHJlYW1lci5jb218MTA3LjE1MC41My43NHx8MTExNTE3NTB8MTU2MzUzNHw1MDh8N3w0MXwzfDE2fDB8MHx8MzAweDI1MHwxfDB8MTAyNHg3Njh8NTYzOGE4OWQyOGFmMjMuMzk0OTM5ODc2MTY0ODk5NTZ8NDkyfGY0MGJhYjZhZThjZmEyMjQxMzAwNWRkYzEwMjU0NmIxfDB8Mnxwb3JuZHJlYW1lci5jb218T0t8YjVmODFmNzhjNzg2MTBkNmEzMmM4NDgwNDU2NjNmMDA%3D +0ms
!!! http://main.exoclick.com/click.php?data=IHw5MTEwNjJ8fGh0dHAlM0ElMkYlMkZhZC5kbW0uY29tJTJGYWQlMkZwJTJGciUzRl9zaXRlJTNEMzQ4MiUyNl9hcnRpY2xlJTNENTM2MSUyNl9saW5rJTNEMTA1ODEyJTI2X2ltYWdlJTNEMTA1ODkwJTI2c3VpZCUzRCU3QmNvbnZlcnNpb25zX3RyYWNraW5nJTdEJTI2c2FkJTNEJTdCc2l0ZV9pZCU3RHx8fHx8MTQ0NjU1Mzc2NHxwb3JuZHJlYW1lci5jb218MTA3LjE1MC41My43NHx8MTExNTE3NTB8MTU2MzUzNHw1MDh8N3w0MXwzfDE2fDB8MHx8MzAweDI1MHwxfDB8MTAyNHg3Njh8NTYzOGE4OWQyOGFmMjMuMzk0OTM5ODc2MTY0ODk5NTZ8NDkyfGY0MGJhYjZhZThjZmEyMjQxMzAwNWRkYzEwMjU0NmIxfDB8Mnxwb3JuZHJlYW1lci5jb218T0t8YjVmODFmNzhjNzg2MTBkNmEzMmM4NDgwNDU2NjNmMDA%3D
  horseman phantomjs onLoadFinished triggered. +301ms
  horseman .open: http://main.exoclick.com/click.php?data=IHw5MTEwNjJ8fGh0dHAlM0ElMkYlMkZhZC5kbW0uY29tJTJGYWQlMkZwJTJGciUzRl9zaXRlJTNEMzQ4MiUyNl9hcnRpY2xlJTNENTM2MSUyNl9saW5rJTNEMTA1ODEyJTI2X2ltYWdlJTNEMTA1ODkwJTI2c3VpZCUzRCU3QmNvbnZlcnNpb25zX3RyYWNraW5nJTdEJTI2c2FkJTNEJTdCc2l0ZV9pZCU3RHx8fHx8MTQ0NjU1Mzc2NHxwb3JuZHJlYW1lci5jb218MTA3LjE1MC41My43NHx8MTExNTE3NTB8MTU2MzUzNHw1MDh8N3w0MXwzfDE2fDB8MHx8MzAweDI1MHwxfDB8MTAyNHg3Njh8NTYzOGE4OWQyOGFmMjMuMzk0OTM5ODc2MTY0ODk5NTZ8NDkyfGY0MGJhYjZhZThjZmEyMjQxMzAwNWRkYzEwMjU0NmIxfDB8Mnxwb3JuZHJlYW1lci5jb218T0t8YjVmODFmNzhjNzg2MTBkNmEzMmM4NDgwNDU2NjNmMDA%3D - status: success +0ms
  horseman .waitForNextPage() +0ms
!!! http://www.r18.com/lp/videos/007/?utm_campaign=kaig_212_exoclickppl_335&utm_content=inter&utm_source=exoclickppl&utm_medium=adnw
phantom stderr: PhantomJS has crashed. Please read the crash reporting guide at
<http://phantomjs.org/crash-reporting.html> and file a bug report at
<https://github.com/ariya/phantomjs/issues/new>.
Please attach the crash dump file:
  /tmp/7ee2d213-700a-421c-1167f6bd-6dcbcf74.dmp

  horseman Timeout during waitForNextPage() +5s
  horseman .close(). +0ms
Request() error evaluating exit() call: Error: connect ECONNREFUSED
johntitus commented 9 years ago

horseman .userAgent() set +7ms

I don't see a userAgent command in your script?

mjp0 commented 9 years ago

Ah, sorry, I was testing earlier today and left it there. Doesn't seem to be the culprit though ;)

mjp0 commented 9 years ago

Actually, this blows my mind - I removed the userAgent and it works.

johntitus commented 9 years ago

huh. What was the useragent string you gave it, if you don't mind providing it? I'd like to try and recreate it.

mjp0 commented 9 years ago

Mozilla/5.0 (Macintosh; Intel Mac OS X 10.10; rv:40.0) Gecko/20100101 Firefox/40.0 - straight from my own browser.

mjp0 commented 9 years ago

Fascinating bug. It has got something to do with rv:40.0 part. If you remove that, everything works.

update Nope, also crashes with Mozilla/5.0 (Macintosh; Intel Mac OS X 10_10_5) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/48.0.2546.0 Safari/537.36 and it doesn't have rv part.

johntitus commented 9 years ago

Boom - crashes for me too with that user agent. How weird. It actually works for me if I use rv:38.0 or lower. Set it to rv:39.0 and it crashes.

Are you able to open the page in a real firefox, version 39 or later? Anything weird happen? I'm wondering if they're doing some kind of user agent snooping and responding differently for later versions.

mjp0 commented 9 years ago

Yes, works normally with the real Firefox, or at least I don't see anything and developer tools don't show anything out of ordinary.

I just tried with that PhantomJS test script I mentioned and added that user agent - it still works.

johntitus commented 9 years ago

Could you post your PhantomJS script? I want to dig into this at some point.

awlayton commented 9 years ago

Whatever is going on here, I am unable to reproduce it with Ubuntu 15.04.

mjp0 commented 9 years ago

@johntitus, here you go. 10sec timeout is just to make sure all redirects are followed, didn't bother to implement more sophisticated mechanism ;)

var page = require('webpage').create()
page.settings.userAgent = 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10.10; rv:40.0) Gecko/20100101 Firefox/40.0'

var url = 'http://main.exoclick.com/click.php?data=IHw5MTEwNjJ8fGh0dHAlM0ElMkYlMkZhZC5kbW0uY29tJTJGYWQlMkZwJTJGciUzRl9zaXRlJTNEMzQ4MiUyNl9hcnRpY2xlJTNENTM2MSUyNl9saW5rJTNEMTA1ODEyJTI2X2ltYWdlJTNEMTA1ODkwJTI2c3VpZCUzRCU3QmNvbnZlcnNpb25zX3RyYWNraW5nJTdEJTI2c2FkJTNEJTdCc2l0ZV9pZCU3RHx8fHx8MTQ0NjU1Mzc2NHxwb3JuZHJlYW1lci5jb218MTA3LjE1MC41My43NHx8MTExNTE3NTB8MTU2MzUzNHw1MDh8N3w0MXwzfDE2fDB8MHx8MzAweDI1MHwxfDB8MTAyNHg3Njh8NTYzOGE4OWQyOGFmMjMuMzk0OTM5ODc2MTY0ODk5NTZ8NDkyfGY0MGJhYjZhZThjZmEyMjQxMzAwNWRkYzEwMjU0NmIxfDB8Mnxwb3JuZHJlYW1lci5jb218T0t8YjVmODFmNzhjNzg2MTBkNmEzMmM4NDgwNDU2NjNmMDA%3D';

function printArgs() {
    var i, ilen;
    for (i = 0, ilen = arguments.length; i < ilen; ++i) {
        console.log("    arguments[" + i + "] = " + JSON.stringify(arguments[i]));
    }
    console.log("");
}
page.onUrlChanged = function() {
    console.log("page.onUrlChanged");
    printArgs.apply(this, arguments);
};
page.onNavigationRequested = function() {
    console.log("page.onNavigationRequested");
    printArgs.apply(this, arguments);
};
page.open(url, function (status) {
  //Page is loaded!
  setTimeout(function(){
    phantom.exit();
},10000)

});