rchipka / node-osmosis

Web scraper for NodeJS
4.12k stars 246 forks source link

how can return values be guaranteed? #265

Open tripower opened 2 years ago

tripower commented 2 years ago

is it possible that a return value is guaranteed? because if one "find" doesn not find anything the whole data is empty so I tried this code but it only returns the first value??

const osmosis = require('osmosis');
let savedData = [];

var objSite = osmosis.get('https://www.wallstreet-online.de/etf/a0rpwh-ishares-core-msci-world-ucits-etf');
objSite.find("//*[@class='marketchange'][1]/div[1]/div[1]/div[1]/span")
    .set({'kurs': 'text()'})
    .data(data => {
        console.log("data1:" + data);
        savedData.push(data);
        objSite.find("//*[@class='marketchange'][1]/div[1]/div[3]/span")
                .set({'perc': 'text()'})
                .data(data => {
                      console.log("data2:" + data);
                      savedData.push(data);
                })
                .done(function() {
                    var strFileData = JSON.stringify( savedData, null, 4);
                    console.log("strFileData:" + strFileData);
                    fs.writeFile('data.json', strFileData, function(err) {
                     if(err) console.error(err);
                     else console.log('Data Saved to data.json file');
                   })
                });
    });

=> output strFileData:[ { "kurs": "71,39" } ]

original example which works if all xpath queries find something

    objSite.find("//*[@class='marketchange'][1]/div[1]/div[1]/div[1]/span")
    .set({'kurs': 'text()'})
    .find("//*[@class='marketchange'][1]/div[1]/div[3]/span")
    .set({'perc': 'text()'})
    .data(data => {
      console.log("data2:" + data);
      savedData.push(data);
    })
    .done(function() {
        var strFileData = JSON.stringify( savedData, null, 4);
        console.log("strFileData:" + strFileData);
        fs.writeFile('data.json', strFileData, function(err) {
         if(err) console.error(err);
         else console.log('Data Saved to data.json file');
       })
    });

=> output strFileData:[ { "kurs": "71,39", "perc": "+0,08" } ]

jueschus commented 2 years ago

the first problem is that you use the same objSite object for two separate chains, so you overwrite the previous defined data/done functions (is i understand) -> you should put them together:

objSite
    .find("selector1")
    .set()
    .find("selector2")
    .set()
    .data((dataForBothSelectors) => {})
    .done()

BUT: with this solution if e.g. selector1 was not found in the DOM the chain breaks and done is called (works as implemented, s. lib/commands/find.js:42).

only solution i know would be to create multiple separate objSite objects (multiple site visits), then you can ignore if one does not resolve any value.

or is there a functionality in the library to ignore find errors?

tripower commented 2 years ago

thx

yes would be a cool festure with a flag that wrong xpath will return empty value instead that the whole data is empty if one fails

tripower commented 2 years ago

my local HACK to get a result as desired hack for osmosis

Ruinevo commented 7 months ago

you can use "do" for this

only the "do" block will fall if the selector is not found

example:


      osmosis
        .get(url)
        .delay(1000)
        .set({
          text: ['.topic-body .topic-body__content-text'],
          videoId: '.topic-body .box-external-video .box-external-video__eagle@data-vid',
          title: '.topic-body__title'
        })
        .do(
          osmosis
          .follow('.topic-body .topic-body__title-image-zoom@href')  // --> **this selector may be missing. **
          .delay(1000)
          .set({src: '.comments-page__body .comments-page__title-image@src'})
        )
        .data(resolve)
        .error(reject)
        .debug(console.log)
        .done()
    })