thicccat688 / selenium-recaptcha-solver

ReCAPTCHA solver for selenium (Using audio).
MIT License
106 stars 37 forks source link

Several enhancements #3

Closed alexreg closed 1 year ago

alexreg commented 1 year ago

Note that the test isn't passing any more (even before this PR). I suspect the latest version of undetected-chromedriver (3.4.5) might have broken something with your test. I'll let you resolve that, if that's okay?

alexreg commented 1 year ago

Two other very minor points, which don't have to be resolved in this PR:

  1. Maybe it's a good idea to rename the API class to something more descriptive like RecaptchaSolver? I know the package name is already nice and descriptive, but this seems to be convention. Definitely a bit subjective though!
  2. I noticed a lot of dependencies under install_requires in setup.py. I think most of these are transitive dependencies and are installed anyway when the immediate dependencies are installed (pydu, requests, selenium, SpeechRecognition). So, unless I'm missing something, we can make it more manageable by reducing to just these 4 dependencies?
thicccat688 commented 1 year ago

Hi Alex,

I'll check out your changes soon.

I'll also find out what's wrong with the test cases.

About the two minor points you mentioned:

  1. I agree, let's change the API class to RecaptchaSolver since it’s more descriptive, but we need to ensure it's backwards compatible (I can handle that).

  2. Good point! My mistake was using pip freeze to get my requirements, but I'll review them to guarantee I only include direct dependencies.

Thanks for your help, and I'll get back to you once I've tested your changes!

thicccat688 commented 1 year ago

Hi again!

I checked out your PR and fixed these aspects:

Some things that still need to be done:

If you have more changes or recommendations, please let me know because you and your feedback have been of great help!

thicccat688 commented 1 year ago

@alexreg Can you give me your thoughts on the final PR before I merge it?

alexreg commented 1 year ago

@thicccat688 Sure. I was going to reply earlier, but got busy, sorry. Will have a closer look in a bit.

alexreg commented 1 year ago

@thicccat688 Working on it reviewing it now... A few comments and questions:

I'll push a few new commits shortly so you can check. :-)

thicccat688 commented 1 year ago

@alexreg

I added the language to the SphinxService since its service lets you pass the language as an argument, which the other services don't. But you're right; for the sake of conformity, it should be omitted.

And you're right about the SpeechRecognition modules; I was on 3.8.1, but 3.9.0 is the latest and has all the modules you had initially put, so the other services should be included again. Sorry for the confusion!

I switched from undetected-chromedriver because it's been causing issues recently, especially for inputting certain characters, probably due to it not being as actively maintained as before.

You use the click_recaptcha_v2 if there's a ReCAPTCHA checkbox (In the case of ReCAPTCHA V2 visible), you need to click and want to have any challenges that pop up resolved automatically. If the site uses ReCAPTCHA V2 invisible, there is no checkbox, so you only want to invoke the solve_recaptcha_v2_challenge to solve the challenge and not click any checkbox.

The documentation needs a re-write too, which I'll do soon (unless you've already started).

Once again, thank you for your feedback, and if you have any more questions, please let me know!

alexreg commented 1 year ago

@thicccat688

Ah, fair enough. I'll go add the other ones back then? They're untested, but why not eh? Since they come almost for free...

That's fair enough re undetected-chromedriver. I've been active on the issues and PRs for that project recently, and the author has picked up activity again the last week or so, but the problem is there are a few bugs because he recently overhauled part of his approach. So, probably makes sense to stick with selenium-stealh for now, indeed, and you can always revert at a later point if you like.

Thanks for clarifying the difference between the two methods. I'll go add that to the docs. (I had already started improving them since you mentioned it.)

thicccat688 commented 1 year ago

@alexreg Yeah, it'd be a bit of a hassle to get the credentials to test them all 😅.

Also, no problem! I'll try replying to your questions as fast as possible. Thanks for the help with the docs, too, since those need a lot of improvement.

I'm going off now, so I'll get back to you tomorrow if you need anything else. See you tomorrow!

alexreg commented 1 year ago

@thicccat688 No worries. Always happy to contribute in little ways like this when I'm actively using an open-source library!

I just pushed several commits, with the test passing for me. I think it should address all the issues we mentioned except your last one – about Google detecting automated queries – but that's a rather complex issue, and maybe best left for another time. (Also, undetected-chromedriver helps with it to some extent, and is constantly making progress.)

Let me know what you think tomorrow. Good night. :-)

thicccat688 commented 1 year ago

@alexreg

Your changes look great!

I've tested them now, and they're also working for me.

We could start using undetected-chromedriver again when its significant bugs are fixed. Please let me know if you know it's production ready again!

I fixed a minor issue with the LexService since there we no arguments to pass the bot name, alias, and user id; All the other services look good.

Also, the delay configuration is a good addition too! I slightly reduced the default maximum delay to 2 instead of 3 seconds since that was a bit too slow.

Please let me know your thoughts so we can wrap up this PR and get the new version out :D!

alexreg commented 1 year ago

@thicccat688

I've tested them now, and they're also working for me.

Glad to hear.

We could start using undetected-chromedriver again when its significant bugs are fixed. Please let me know if you know it's production ready again!

Yep, makes sense. I'll let you know.

I fixed a minor issue with the LexService since there we no arguments to pass the bot name, alias, and user id; All the other services look good.

Thanks, this looks right now.

Also, the delay configuration is a good addition too! I slightly reduced the default maximum delay to 2 instead of 3 seconds since that was a bit too slow.

Yeah, no problem. We'll see how this fairs "in the wild" but I suspect 1-2 seconds will be fine.

Please let me know your thoughts so we can wrap up this PR and get the new version out :D!

I've now normalised the docstrings, since it seems you prefer the summary line to appear on the line after """ and not the same line – I just made them all that way in the last commit.

I see you removed delay_config from the test, which is fair enough, but shouldn't we test delay_config somewhere at least? Maybe a separate test?

Good to go, otherwise!

thicccat688 commented 1 year ago

@alexreg

Thanks for all the feedback!

Your last point makes a lot of sense; I removed it to run the tests without delay and forgot to add it back. I will add it back as soon as I can.

If you have nothing else to point out, I’ll make a new release and upload it to PyPi.

Thanks again :)!

alexreg commented 1 year ago

@thicccat688 No problem. Looks great otherwise; nothing to hold up a new release. :-)