phantombuster / nickjs

Web scraping library made by the Phantombuster team. Modern, simple & works on all websites. (Deprecated)
https://nickjs.org
ISC License
500 stars 48 forks source link

Help request: Struggling to do what feels like fairly basic push 'show more' button and scroll #62

Closed kpennell closed 5 years ago

kpennell commented 5 years ago

I'm all sorts of stuck and have been for 5-6 hours. I'm trying to open a tab, click a button at the bottom (that loads more items) and then scroll down to load more items via infinite scroll. I'm just stuck as hell. I can't tell whether it's the async/await, selectors, just not following the examples, just not getting nick (script vs tab context). Not sure if anyone is checking this but here goes.

I tried to follow some of your examples (1, 2 3)

const Nick = require("nickjs");
const nick = new Nick();

process.env.CHROME_PATH = "/Applications/Google Chrome.app/Contents/MacOS/Google Chrome";

// const scrollToSpinners = (argv, cb) => {

//  Array.from(document.querySelectorAll.find(el => el.textContent == 'View All');
//  cb(null, true)
// }

const clickFooterButton = (argv, cb) => {
    //document.querySelectorAll.find(el => el.textContent == 'View All').click();

   let ShowMore = Array.from(document.querySelectorAll('div'))
  .find(el => el.textContent === 'View All');
  console.log(ShowMore)

    cb(null, true)
}

const SPINNER_SELECTOR = "footer";

  const scrollDown = async (tab) => {
    console.log("Scrolling down...", "loading")
    await tab.scroll(0, 1000)
    await tab.scroll(0, 2000)
    await tab.scroll(0, 3000)
    await tab.scroll(0, 4000)
    await tab.scrollToBottom()
    await tab.wait(3000)
    await tab.scrollToBottom()
    await tab.wait(3000)
    await tab.scrollToBottom()
    await tab.wait(1000)
}

(async () => {
  const tab = await nick.newTab();
  await tab.wait(1000);
  await tab.open("https://www.sitesitesite.com/en?great=ChIJAVkwefwewefW0YgIQQ");

  await tab.untilVisible("body"); // Make sure we have loaded the page

  await tab.inject("http://code.jquery.com/jquery-3.2.1.min.js"); // We're going to use jQuery to scrape

//  $("footer")[0].scrollIntoView();

//await tab.evaluate(scrollToSpinners, { spinner: SPINNER_SELECTOR })  

await tab.evaluate(clickFooterButton, { spinner: SPINNER_SELECTOR })  

await scrollDown(tab)

  const YesLinks = await tab.evaluate((arg, callback) => {

    const data = [];

   $("div:contains('View All')").parent().click();

    $('div[class*="selector"]').each((index, element) => {
      data.push({
        param: $(element)
          .find("div div:nth-child(2)")
          .text(),
        param2: $(element)
          .find("div div:nth-child(4) h2")
          .text(),
        param3: $(element)
          .find("div div:nth-child(4) div div")
          .text()
      });
    });
    callback(null, data);
  });

  //console.log(raLinks)
  let stringifiedLinks = JSON.stringify(YesLinks, null, 2);

  console.log(stringifiedLinks);

})()
  .then(() => {
    console.log("Job done!");
    nick.exit();
  })
  .catch(err => {
    console.log(`Something went wrong: ${err}`);
    nick.exit(1);
  });

Is there some really straightforward way to click a button at the bottom, scroll down a couple times, then extract the items? I get so lost in all the flippin callbacks, args, await/async, bleh. There's just so much complexity at some point.

kpennell commented 5 years ago

meh, I had all sorts of problems but I was selecting wrong, using tab.evaluate wrong, etc.

TASnomad commented 5 years ago

Hello @kpennell,

Does the targeted website perform any XHR calls when you click on the "load more" button? If so, you could wait until the XHR call finishes instead of waiting for CSS selectors.

kpennell commented 5 years ago

Thanks for the response and suggestion. I might have gotten it figured out. @TASnomad