matthewmueller / x-ray

The next web scraper. See through the <html> noise.
MIT License
5.87k stars 349 forks source link

adding proxy to x-ray node js #299

Closed bassemAmous closed 6 years ago

bassemAmous commented 6 years ago

Subject of the issue

Describe your issue here.

Your environment

Steps to reproduce

Tell us how to reproduce this issue.

Expected behaviour

Tell us what should happen.

Actual behaviour

Tell us what happens instead.

bassemAmous commented 6 years ago

I am developing a node js application and I am scraping a web site using x-ray and I think the blocked my ip address so how can I configure a proxy or hide my Ip address for that ? here is my code:

var Xray = require('x-ray');
var x = Xray()
x('https://www.myurl.com',  {

    title: x('#cm_cr-review_list .a-section.review', [{,
        blogs:"..a-text-bold"
    }]),
})
    .paginate('li.a-l a@href')
    .write('result.json')
0xgeert commented 6 years ago
  1. x-ray allows for custom drivers .. E.g.: see prev link to a request driver.
  2. adapt request driver to point to a proxy
  3. You can choose proxies yourself randomly/ round-robin, etc. (you need to build that into your driver) or choose a managed rotating-proxy. Advantage of the latter is that you just need to point the proxy argument of your request-driver to 1 static url and be done with it. There's paid ones, there's also this which sets up a socks5 proxy using Haproxy on your own server using TOR.

Hth

On Thu, Jan 11, 2018 at 12:34 PM, bassemAmous notifications@github.com wrote:

I am developing a node js application and I am scraping a web site using x-ray and I think the blocked my ip address so how can I configure a proxy or hide my Ip address for that ? here is my code:

var Xray = require('x-ray'); var x = Xray() x('https://www.myurl.com', {

title: x('#cm_cr-review_list .a-section.review', [{,
    blogs:"..a-text-bold"
}]),

}) .paginate('li.a-l a@href') .write('result.json')

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/matthewmueller/x-ray/issues/299#issuecomment-356908672, or mute the thread https://github.com/notifications/unsubscribe-auth/AAYAK5zpthH16iRYTmRyxD5vnsQ0dsgOks5tJfGvgaJpZM4RatV6 .

bassemAmous commented 6 years ago

@gebrits Thank you for your answer but I didn't really understand the second part could you please send me an example for that ?

0xgeert commented 6 years ago

Something like this. Untested: (second option in the comments)

https://gist.github.com/gebrits/57689768eceaec43ae0ddd17949d7503

bassemAmous commented 6 years ago

Thank you I just resolved the problem.