yarsky-tgz / scraper-http-client-request

Powerful HTTP client for scrapers with cookies jar, cloudflare bypasser, rate limiter and retrys
MIT License
0 stars 0 forks source link

How to use ? #2

Open kopax opened 3 years ago

kopax commented 3 years ago

Hi @yarsky-tgz and thanks for sharing.

I want to scrape my malt.fr profile and it appeard after using fetch that they are protected with cloudflare.

Since that, I have found cloudflare-bypasser and your plugin.

The first one fail when I do await cf.request('https://www.malt'fr);, it throw Error: Captcha

Regarding your plugin, I am reading the readme, it seems this is an http client made for that use case but I can't see in the readme how to perform a request.

How do I call an HTTP request ?

Thanks a lot for helping.

yarsky-tgz commented 3 years ago

Hello @kopax !

I am very happy, that my sources interested you, but here we have same cloudflare-bypasser, i am using it internally.

https://github.com/yarsky-tgz/scraper-http-client-request/blob/master/package.json#L28

and cloudflare-bypasser will not pass captcha.

It's heavy, very heavy task to pass the captcha, especially that one kind of them, which is shown by Cloudflare by the way. Very heavy to be solved by AI, while all the visitors are teaching to solve them internal AI, which understands, are you robot or not.

Simple, but may be from first view little complicated to understand situation.

You do not have much options here. Try different IPs, dynamic pools, if you have such technical resources.

yarsky-tgz commented 3 years ago

my library having internal retry mechanism. you can try it, may be Captcha is shown not for each request

yarsky-tgz commented 3 years ago

or, just use the retry-promise or ts-retry-promise packages

yarsky-tgz commented 3 years ago

PS. @kopax and what about the examples of usage:

quoting existing README:

You will get HTTP client request-promise-native wrapped together with

Check the very beginning of next after Installation section of it:

https://github.com/yarsky-tgz/scraper-http-client-request/blob/master/README.md#intro-what-i-will-get

yarsky-tgz commented 3 years ago

all plugins are wrapped around it. Such combo, which can be helpful in building some complex solution faster, if needed exactly same plugin pack.

I recommend to build your own, while you don't need all of them, @kopax .

Hope I've helped you in your development task solving 🔢 :)

yarsky-tgz commented 3 years ago

https://github.com/request/request-promise#cheat-sheet

yarsky-tgz commented 3 years ago

I will think about README review and update later.

PR about it are welcome