matthewmueller / x-ray

The next web scraper. See through the <html> noise.
MIT License
5.87k stars 349 forks source link

Conditional redirect? #280

Open FezVrasta opened 6 years ago

FezVrasta commented 6 years ago

Subject of the issue

I'm scraping a page that requires the user to click on a "thanks" button before I can access the info I'm looking for. If the page has already been "thanked", an "unthank" button will be shown instead. I don't want the scraper to click on the "unthank" obviously.

Right now I'm using a code of this kind:

const x = Xray({
  filters: {
    // this filter makes the "thank/unthank" URL call become always a "thank" call
    // doing so I avoid to unthank anything that was previously thanked
    grateful: url => url && url.replace('withdrawthank', 'thank'),
  },
}).driver(driverWithAuth);

x(
  websiteUrl,
  '.subject > div > span',
  [
    {
      title: 'a',
      url: 'a@href',
      magnets: x(
        'a@href',
        x('.thank_you_button:first-child a@href | grateful', [
          'a@href',
        ])
      ),
    },
  ]
)

The problem with this setup is that I will always have to navigate to an additional page even when the page is already "thanked".

Ideally I'd like to be able to tell x-ray to not navigate to anything but just stay in the same page if the provided URL is maybe undefined or false.

Doing so I could have my filter do url.includes('thank') ? url : false, which will make the scraper hit the thank call only if needed.

Is it possible? If not, could this be added?

lathropd commented 5 years ago

Can your share here (or DM me on Twitter) the url of the site in question?

xochilpili commented 4 years ago

Any solution for this? I am in the same situation, if there is a result or if not, then i can decide what to do in the filter, ie:

//my filter.js
export.isThere = (value)=>{
    return typeof value === undefined || value === '' ? value : ' No value';
}

x('sample_url','body',[{
  title: '.title | isThere ',
  description: '.title+div | isThere'
}]);

Thanks!