sbrown92 / reCaptchaTestSite

A script that will automatically solve the audio challenge posed by Google's reCaptcha
0 stars 0 forks source link

Proxy Pool #16

Closed nickdibari closed 7 years ago

nickdibari commented 7 years ago
nickdibari commented 7 years ago

Implemented feature to only pull IPs and Hosts from US. Still does not seem to return an audio file, as widget thinks (or knows really) we're sending automated requests

nickdibari commented 7 years ago

Changing site used to http://freeproxylists.net

EDIT: freeproxylists didn't work, ironically enough you need to pass a captcha to access the site so we can't use it for automation. Switching to coolproxys to hopefully curtail this

EDIT 2: coolproxys did not work either, as they use Javascript to write the IP address in table, obfuscating it from scrapers and making my job that much harder. Final shot is to use spys.me as they post a text file of proxies hourly

nickdibari commented 7 years ago

Good news: spys.me works really well. Using regexs to parse the text returned for IPs and Hosts from the US. Also updates every hour which will help us keep a fresh proxy pool

Bad news: Still being flagged as an automated request so no challenge returned. What could be the cause of reCaptcha refusing to return an audio challenge when using a proxy?

nickdibari commented 7 years ago

Upon further investigation, looks like the root problem might be that the proxies we use have already been blacklisted by Google and are known to be compromised. It makes sense that they would then block that IP address from reCaptcha use.

The first thing we should do is try to see if ANY of the proxies we are scraping work. If we can get one working then we can hopefully do some type of loop to check all the proxies in our list to find at least one that will take

Also should consider expanding the criteria for what proxies to add to the pool. Right now it's selecting US proxies with the Google Passed attribute. Consider removing the second check on Google Passed as it does not seem to be all that relevant anymore. More proxies the better anyway

nickdibari commented 7 years ago

Got one! Tested with the following proxy and was able to download a challenge audio file:

Server: 104.37.212.5
Host: 3128

So we know that some of these proxies can work, we just might need to try a couple first

EDIT: Full success (download file->convert to text->pass correct answer->submit form) on the following proxy:

Server: 52.45.142.12
Host: 3128

Seems like this is working for now. Should implement a way to try multiple proxies as it could be that some work and some don't

nickdibari commented 7 years ago

Fixed for now in #18. Further implementations can fine tune the proxy pool but this works!