phantombuster / nickjs

Web scraping library made by the Phantombuster team. Modern, simple & works on all websites. (Deprecated)
https://nickjs.org
ISC License
501 stars 48 forks source link

Open new tabs loop through array #1

Closed clementvp closed 7 years ago

clementvp commented 7 years ago

Hi and first thanks for NickJs and PhantomBuster This is an Help Me Please issue. Im'not very familiar with async/await function, so i'm practicing with this.

My question is: How can i grab a bunch of urls (Like my example code) And loop through an array to open new tab for each url, take a screenshot, close the tab, close nick. I'm a user of Phantombuster too (in experimentation in my company) and i'm really really stuck with this. Anyone to help me ?

import 'babel-polyfill'
import Nick from 'nickjs'
const nick = new Nick()
nick.newTab().then(async function (tab) {
  await tab.open('https://news.ycombinator.com/')
  await tab.waitUntilVisible('#hnmain')
  await tab.inject('https://code.jquery.com/jquery-3.1.1.slim.min.js')
  const urls = tab.evaluate((arg, callback) => {
    const data = []
    $('.athing').each((index, element) => {
      data.push($(element).find('.storylink').attr('href'))
    })
    callback(null, data)
  })
  return urls
}).then((urls) => {
  for (var i = 0; i < urls.length; i++) {
    // here i want open a new tab for each url i have in my urls array
    // And i want to perform a screenshot or a evaluate function
  }
//and after that i want to quit nick
  nick.exit()
})
  .catch((err) => {
    console.log('Oops, an error occurred: ' + err)
    nick.exit(1)
  })

Best Regards

SaShimy commented 7 years ago

Hi @clementvp , So you better understand I will put an example at the end of my response but the async/await thing is very easy to understand: async functions declared with the "async" keyword will return a promise and in "async" functions you can use the keyword "await" that make the script stop until the promise returned by the function is resolved.

Putting await before a function returning a promise force to wait the resolved promise (or the error thrown) to continue. So if you use a for a loop and a function returning a promise with the keyword "await" it will work as a normal for loop (you also need tab in your context to use it see the code below).

And you were missing the await keyword before tab.evaluate(), so the code ran before tab.evaluate() even finished.

Also take care to verify if the url you want to open is also valid so you don't crash your script (they might be urls from the list that can't be opened).

require('babel-polyfill')
const Nick = require('nickjs')
const nick = new Nick()

nick.newTab().then(async function (tab) {
    await tab.open('https://news.ycombinator.com/')
    await tab.waitUntilVisible('#hnmain')
    await tab.inject('https://code.jquery.com/jquery-3.1.1.slim.min.js')
    const urls = await tab.evaluate((arg, callback) => {
        const data = []
        $('.athing').each((index, element) => {
            data.push($(element).find('.storylink').attr('href'))
        })
        callback(null, data)
    })
    for (var i = 0; i < urls.length; i++) {
        await tab.open(urls[i])
        // Do other things using await like "await tab.waitUntilVisible(selector)" or "await tab.screenshot("screen.jpg")"
        // Using the await keyword you can make asynchronous functions working the same way
        // As synchronous functions, so the for loop will end only when all the functions here are done
    }
    // No worry with await you can't get here before the loop is finished
    nick.exit()
})
.catch((err) => {
    console.log('Oops, an error occurred: ' + err)
    nick.exit(1)
})

PS: I see you are using "import" keyword to import libraries, this will not be in node.js for now and so we are changing it back to require in nickjs/phantombuster in the future so I wrote it there so you can be prepared for the change.

PS2: I see you are using a for loop with a counter, with the loop for ... of you could maybe do that in a better way -> see this example

for(const url of urls) {
  await tab.open(url)
  // Do as before but instead of using a counter you can now use url
}
clementvp commented 7 years ago

Thanks you for the quick answer. I was opening a new tab in my for loop instead reusing the initial tab and it was very confuse in my mind juggle with async/await and promises. Your answer is very neat and clear for me and it's work well in my case. Thanks for all the tips (specially the 'check urls before' I will not think of it) and the time, you are great :)

tomasjanu commented 6 years ago

It still seams to me that all the tabs (opened by loop) wait for each other.

Can we run them in paralel?

paps commented 6 years ago

Yes, each tab can be run in parallel. Can you show your code?