Closed alexreg closed 1 year ago
Two other very minor points, which don't have to be resolved in this PR:
API
class to something more descriptive like RecaptchaSolver
? I know the package name is already nice and descriptive, but this seems to be convention. Definitely a bit subjective though!install_requires
in setup.py
. I think most of these are transitive dependencies and are installed anyway when the immediate dependencies are installed (pydu, requests, selenium, SpeechRecognition). So, unless I'm missing something, we can make it more manageable by reducing to just these 4 dependencies?Hi Alex,
I'll check out your changes soon.
I'll also find out what's wrong with the test cases.
About the two minor points you mentioned:
I agree, let's change the API class to RecaptchaSolver since it’s more descriptive, but we need to ensure it's backwards compatible (I can handle that).
Good point! My mistake was using pip freeze to get my requirements, but I'll review them to guarantee I only include direct dependencies.
Thanks for your help, and I'll get back to you once I've tested your changes!
Hi again!
I checked out your PR and fixed these aspects:
Some things that still need to be done:
If you have more changes or recommendations, please let me know because you and your feedback have been of great help!
@alexreg Can you give me your thoughts on the final PR before I merge it?
@thicccat688 Sure. I was going to reply earlier, but got busy, sorry. Will have a closer look in a bit.
@thicccat688 Working on it reviewing it now... A few comments and questions:
audio
parameter!language
parameter to the SphinxService
class? I think it makes sense to always use en-US, no? Anyway, if we're going to make language configurable, it should probably be for all services and indeed the main solver logic, right?SpeechRecognition
module, just out of curiosity? I wonder if it's because I have a more recent version of SpeechRecognition
installed or maybe I have some extension installed? (They all showed up in my local documentation.)click_recaptcha_v2
and solve_recaptcha_v2_challenge
, but could you please clarify the difference between the two scenarios in which you use them? (I can guess from code, but want to be sure!)I'll push a few new commits shortly so you can check. :-)
@alexreg
I added the language to the SphinxService since its service lets you pass the language as an argument, which the other services don't. But you're right; for the sake of conformity, it should be omitted.
And you're right about the SpeechRecognition modules; I was on 3.8.1, but 3.9.0 is the latest and has all the modules you had initially put, so the other services should be included again. Sorry for the confusion!
I switched from undetected-chromedriver because it's been causing issues recently, especially for inputting certain characters, probably due to it not being as actively maintained as before.
You use the click_recaptcha_v2
if there's a ReCAPTCHA checkbox (In the case of ReCAPTCHA V2 visible), you need to click and want to have any challenges that pop up resolved automatically. If the site uses ReCAPTCHA V2 invisible, there is no checkbox, so you only want to invoke the solve_recaptcha_v2_challenge
to solve the challenge and not click any checkbox.
The documentation needs a re-write too, which I'll do soon (unless you've already started).
Once again, thank you for your feedback, and if you have any more questions, please let me know!
@thicccat688
Ah, fair enough. I'll go add the other ones back then? They're untested, but why not eh? Since they come almost for free...
That's fair enough re undetected-chromedriver. I've been active on the issues and PRs for that project recently, and the author has picked up activity again the last week or so, but the problem is there are a few bugs because he recently overhauled part of his approach. So, probably makes sense to stick with selenium-stealh for now, indeed, and you can always revert at a later point if you like.
Thanks for clarifying the difference between the two methods. I'll go add that to the docs. (I had already started improving them since you mentioned it.)
@alexreg Yeah, it'd be a bit of a hassle to get the credentials to test them all 😅.
Also, no problem! I'll try replying to your questions as fast as possible. Thanks for the help with the docs, too, since those need a lot of improvement.
I'm going off now, so I'll get back to you tomorrow if you need anything else. See you tomorrow!
@thicccat688 No worries. Always happy to contribute in little ways like this when I'm actively using an open-source library!
I just pushed several commits, with the test passing for me. I think it should address all the issues we mentioned except your last one – about Google detecting automated queries – but that's a rather complex issue, and maybe best left for another time. (Also, undetected-chromedriver helps with it to some extent, and is constantly making progress.)
Let me know what you think tomorrow. Good night. :-)
@alexreg
Your changes look great!
I've tested them now, and they're also working for me.
We could start using undetected-chromedriver again when its significant bugs are fixed. Please let me know if you know it's production ready again!
I fixed a minor issue with the LexService since there we no arguments to pass the bot name, alias, and user id; All the other services look good.
Also, the delay configuration is a good addition too! I slightly reduced the default maximum delay to 2 instead of 3 seconds since that was a bit too slow.
Please let me know your thoughts so we can wrap up this PR and get the new version out :D!
@thicccat688
I've tested them now, and they're also working for me.
Glad to hear.
We could start using undetected-chromedriver again when its significant bugs are fixed. Please let me know if you know it's production ready again!
Yep, makes sense. I'll let you know.
I fixed a minor issue with the LexService since there we no arguments to pass the bot name, alias, and user id; All the other services look good.
Thanks, this looks right now.
Also, the delay configuration is a good addition too! I slightly reduced the default maximum delay to 2 instead of 3 seconds since that was a bit too slow.
Yeah, no problem. We'll see how this fairs "in the wild" but I suspect 1-2 seconds will be fine.
Please let me know your thoughts so we can wrap up this PR and get the new version out :D!
I've now normalised the docstrings, since it seems you prefer the summary line to appear on the line after """
and not the same line – I just made them all that way in the last commit.
I see you removed delay_config
from the test, which is fair enough, but shouldn't we test delay_config
somewhere at least? Maybe a separate test?
Good to go, otherwise!
@alexreg
Thanks for all the feedback!
Your last point makes a lot of sense; I removed it to run the tests without delay and forgot to add it back. I will add it back as soon as I can.
If you have nothing else to point out, I’ll make a new release and upload it to PyPi.
Thanks again :)!
@thicccat688 No problem. Looks great otherwise; nothing to hold up a new release. :-)
Note that the test isn't passing any more (even before this PR). I suspect the latest version of undetected-chromedriver (3.4.5) might have broken something with your test. I'll let you resolve that, if that's okay?