Closed patrickarlt closed 8 years ago
You probably have already solved this, but here's my solution for anybody who might stumble here.
Switch the driver to something you have control over. Here's a quick and dirty example, I hope it's clear.
var R = require('ramda'),
Promise = require('bluebird'),
xray = require('x-ray'),
_request = require('request');
var request = _request.defaults({
jar: true,
headers: {'User-Agent': 'Mozilla/5.0 (Windows NT 6.1; WOW64; rv:44.0) Gecko/20100101 Firefox/44.0'}
});
var prequest = (opts) => new Promise((resolve, reject) => request(opts, (err, res, body) => R.isNil(err) ? resolve(body) : reject(err)));
// x-ray request "driver" wow so pompous much pretentiousness
var request_driver = (config) => {
var options = config || {};
return (ctx) => prequest(R.merge(options, {uri: ctx.url}));
};
var pray = (url, selector, def) => {
return new Promise((resolve, reject) => {
var x = xray().driver(request_driver());
x(url, selector, def)((err, obj) => R.isNil(err) ? resolve(obj) : reject(err));
});
};
pray('http://imgur.com/search?q=doge', 'div.cards', {image: ['img@src']}).then(console.log);
You can use request.defaults or pass the options to request_driver()
inside the .driver()
call, like this:
var x = xray().driver(request_driver({
jar: true,
headers: {'User-Agent': 'Mozilla/5.0 (Windows NT 6.1; WOW64; rv:44.0) Gecko/20100101 Firefox/44.0'}
}));
Of course the phantom driver by the author might work. I haven't tried it because it depends on stuff that I don't understand :/
Working on that under #51
This might be a duplicate of https://github.com/lapwinglabs/x-ray/issues/91.
I'm using x-ray to build a link checker for a large production site. It is working great but I can't use it to test our development site because we keep it behind a password protected splash screen.
If I could set a cookie when using x-ray I could make this work. Digging around the code a little I see your setting headers on https://github.com/lapwinglabs/x-ray-crawler/blob/03b89901e9857925d80a0e5b80fdbe297510789b/lib/http-driver.js#L26 but I cant figure out where that is coming from.