phantombuster / nickjs

Web scraping library made by the Phantombuster team. Modern, simple & works on all websites. (Deprecated)
https://nickjs.org
ISC License
500 stars 48 forks source link

Error: timeout: load event did not fire #46

Closed netdelight closed 5 years ago

netdelight commented 5 years ago

I get this : Error: timeout: load event did not fire after xxx ms while I'm trying to open website such as azlyrics.com or genius.com, even when I extend the timeout option to 60000ms. I can quickly connect to https://news.ycombinator.com/ or other websites though. Are some websites preventing headless browser connections ? If so, is there any workaround ?

paps commented 5 years ago

Interesting. Can you test with headless set to false in NickJS' constructor options?

netdelight commented 5 years ago

With the headless option I get this error : Fatal: Chrome subprocess exited with code 1 I'm using NickJS with Google Chrome 69.0.3497.92 beta and NodeJS v8.12.0 in a Vagrant box (ubuntu/trusty64 distribution). Here is my code :

const Nick = require("nickjs")
const nick = new Nick({
  timeout: 60000,
  loadImages : false,
  // tried headless
  headless : false,
  // fake user agent
  userAgent: "Mozilla/5.0 (Windows NT 10.0; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/53.0.2785.143 Safari/537.36"
})

;(async () => {

    const tab = await nick.newTab()
    const url = "https://genius.com"

    try {
      const [httpCode, httpStatus] = await tab.open(url)
      if ((httpCode >= 300) || (httpCode < 200)) {
        console.log("The site responded with", httpCode, httpStatus)
      } else {
        console.log("Successfully opened", url, ":", httpCode, httpStatus)
        const path = await tab.screenshot("image.jpg")
        console.log("Screenshot saved at", path)
      }
    } catch(err) {
      console.log("Could not open page:", err)
    }

})()
.then(() => {
    console.log("Job done!")
    nick.exit()
})
.catch((err) => {
    console.log(`Something went wrong: ${err}`)
    nick.exit(1)
})
netdelight commented 5 years ago

Okay, I solved this. There was a huge JS file (>1.5 Mo) which was slowing down the page loading. I blacklisted it and I'm now able to complete the loading.

paps commented 5 years ago

Great news :) Glad you figured it out.