vezinaca / Banq_Achat

Divers scripts qui interagissent avec la BANQ
0 stars 0 forks source link

how to circumvent reCaptcha on https://isbnsearch.org/ #37

Closed vezinaca closed 4 years ago

vezinaca commented 4 years ago

or any site. Is it doable?

Have you dealt with this before?

Saw some dudes on youtube who get the audio from it and deal with it.

gregsadetsky commented 4 years ago

No it’s very hard to crack a (Google) recaptcha.

Does the captcha appear after a number of calls from the same ip? I don’t see it when using the web interface

On Jan 24, 2020, at 5:13 PM, vezinaca notifications@github.com wrote:

 or any site. Is it doable?

Have you dealt with this before?

Saw some dudes on youtube who get the audio from it and deal with it.

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub, or unsubscribe.

vezinaca commented 4 years ago

well it appears after I run the program a couple of times using:

        pre_url_search = "https://isbnsearch.org/search"
    search_text = "Frank Zappa"
    param_search = {'s': search_text}

    response = get_response(param_search, headers)

It doesn't seem to like that I'm sending it a search parameter.

(this is the get_response function btw)

def get_response(parametres, headers):
    res = requests.get(pre_url_search, params=parametres, headers=headers)
    res.raise_for_status()
    return res
vezinaca commented 4 years ago

man...is my dream over?

vezinaca commented 4 years ago

unless I find another site? go back to amazon.ca scrapping? (that isbn site was soooo easy so scrape!!!) :(

vezinaca commented 4 years ago

and actually, I wouldn't need to run it many many times...just once to get info of book...I guess I can just run it a couple of times a day to test it...

gregsadetsky commented 4 years ago

Are you sure it’s not just a user-agent thing?

If you run isbn searches 10 times in a row in your browser, do you get the captcha?

There’s plenty of alternative options, don’t get discouraged xx

On Jan 24, 2020, at 6:07 PM, vezinaca notifications@github.com wrote:

 and actually, I wouldn't need to run it many many times...just once to get info of book...I guess I can just run it a couple of times a day to test it...

— You are receiving this because you commented. Reply to this email directly, view it on GitHub, or unsubscribe.

gregsadetsky commented 4 years ago

ok just reloaded 10 times in a row and I do see the captcha on isbnsearch.org

no worries man, we'll just explore some other options

xx

gregsadetsky commented 4 years ago

the easiest option is just understanding what are the limits -- if the site lets you do 10 searches and then throws up the captcha for an hour, I would recommend that you add intervals between your checks so that they don't happen more often than once per hour

do you have an approximate idea of how many books you would be checking per hour..?

vezinaca commented 4 years ago

yes, that's probably the best option, exploring and knowing the limits.

I would only use this site when I need to order a new book from BANQ. I approx. ordered 53 books in the last 2-3 years so it wouldn't be more than once every 2-3 weeks on average. I wouldn't technically reach the limit. For now, I just downloaded the html of some of my searches for test purposes and will work with them. Cheers bud!

vezinaca commented 4 years ago

Also, have you ever seen this type of reCaptcha resolve:

https://www.youtube.com/watch?v=YzjsXqnAO8w&t=116s

seems to rely on audio!!!

gregsadetsky commented 4 years ago

yeah, the accessible version of captchas (via the audio) is more easily breakable... but it's still super hellish to setup, so I strongly recommend not going down that path x