segment-boneyard / nightmare

A high-level browser automation library.
https://open.segment.com
19.54k stars 1.08k forks source link

Scrapping multiple links #1447

Open ghost opened 6 years ago

ghost commented 6 years ago

Let assume that I scap a few links from some website by evaluating it, and save them in an array. How can I make my program go through all these links, and use the same instance of nightmare to scrap some content from those links? So far I've tried to do it using arr.forEach(...), but I guess that construction like that is not appropriate:

nightmare
// goto() etc.
.evaluate(() => {
    return [...document.querySelectorAll('a.somelink')]
      .map(el => el.href)
  })
  .then(result => {
    result.forEach((val, i, arr) => {
      nightmare
        .goto(val)
        .evaluate(() => {
          return document.querySelector('a.link').href
        })
        // rest of code
    })
  })

I guess that it's not good to use the same instance in .then(). How can I make it work?

horcrux2301 commented 6 years ago

You can probably solve this issue using async-await.

nightmare
// goto() etc.
.evaluate(() => {
    return [...document.querySelectorAll('a.somelink')]
      .map(el => el.href)
  })
  .then(result => {
    result.forEach((val, i) => {
      nightmare
        .goto(val)
        .evaluate(
          () => {
          return Array.from(document.querySelectorAll('a.link')).map(
          element => ({"link" : element.href})
          );
        })
        .end()
        .then((data) => {
          let x = scrapeTheseLinks(data);
          x.then((data) => {
            // use the data returned from all these links.
          });
        })
    })
  })

  async function scrapeTheseLinks(data){
    let dataFromLinks = [];
    data.forEach((el) => {
      const x = await scraperForLinks(el);
      data.push(x);
    });
    return dataFromLinks;
  }

  function scraperForLinks(link){
    nightmare
      .evaluate(() => {})
      // write the rest of the code for each link
  }