segment-boneyard / nightmare

A high-level browser automation library.
https://open.segment.com
19.54k stars 1.08k forks source link

Understanding Promise chain building with Nightmare.js #1621

Closed Levelleor closed 3 years ago

Levelleor commented 3 years ago

I am not sure I understand what Nightmare.js returns exactly. I see Nightmare instance as a browser, meaning I can ask it to do anything 1 by 1. Visit that page, scrap the info, then check another page, get more information, and save it all somewhere. Also since it's promise-based I assume it is possible to use async/await everywhere to achieve consistency. Thus I created the code below:

await nightmare
  .goto(paginate.url)
  .wait('.leftcol')
  .evaluate(() => {
    //creating the object to process in the then() call
    return {
      searchrows: [...document.querySelectorAll('.search_result_row')].map(e=>{return {id: e.dataset.dsAppid, name: e.querySelector('.title').innerHTML}}),
      nextButton: hasNextButton(document.querySelectorAll('.pagebtn'))
    }
  })
  //processing the aforementioned object
  .then(e => {
    //is never called (or maybe called, but the program ends, possibly due to .end being reached sooner
    console.log(e);
    e.searchrows.forEach(async game => {
      let gameid = game.gid;
      let gamename = game.name;
      let gameurl = updateURL(gameid);
      let gametags = [];
      await nightmare
        .goto(gameurl)
        .wait('.popular_tags')
        .click('.add_button')
        .wait('.app_tag_modal_left')
        .evaluate(() => [...querySelectorAll('.app_tag_control>.app_tag')].map(e=>{return {id: e.dataset.tagid, name: e.innerHTML.trim()}}))
        .then((gameTags) => {
          gameTags.forEach(e => {
            TAG_LIST.add({id: e.id, name: e.name});
            gametags.push(e.id);
          });
        });
      GAME_LIST.add({id: gameid, name: gamename, tags: gametags});
      console.log('done fetching game #' + gameid);
    })

    paginate.update(e.searchrows.length, e.nextButton);
    paginate.state();
  })
  .catch(error => {
    console.error('Search failed:', error)
  })

So the issue is:

Everything after the first .then(e => { is never called. Or maybe it is called, but synchronously which breaks the flow, and logic, since I specifically added await in there to make sure it waits for the results. I have a total of 3 parts that do different things, like get the categories, then actual articles of the same website whenever the loop requires to get more info, though I have to make sure it's all strictly asynchronous, otherwise it's going to be impossible to navigate it.

Why isn't it waiting for the first instance to finish? Instead, it completely skips the whole then call and runs another loop, based on the console logs.

I thought I'd simply add await to each nightmare instance, yet here I am at 4 am trying to figure it out :D

Levelleor commented 3 years ago

Ok. I didn't know that. Apparently .forEach cannot run in async mode. I'm rewriting everything to for ... of loops.