m-niemiec / captcha_solving_service

Captcha Solving Service is a mock-up SaaS that allows users to send their captcha images and receive solutions in simple text format. Project is divided into 4 parts. Scraping datasets for machine learning models, GUI for renaming collected images, captcha solving OCR and captcha solving API.
MIT License
16 stars 3 forks source link

license #1

Closed test2a closed 1 year ago

test2a commented 2 years ago

hi. can you add a license to the project? i just found it and i would love to use it, just found out the "captcha" i would solve is not supported so if i were to do that, can i do it on my machine or do i need the aws ?

the readme isn't really clear on few topics

test2a commented 2 years ago

about istallation i mean training a new type of data and all that

m-niemiec commented 2 years ago

@test2a Hello, thank you for reaching out to me. Sure, I added MIT license.

Yes, you can add more captcha types to the program. You would have to gather and name dataset for this new captcha, train new model for solving this new type (it can work "out of the box" or it might need some additional tweaking) and optionally add it to captcha type recognizer. You don't need to host it on AWS, if you would like you can host Fast API instance on your own machine or you could even skip this whole step if you don't need it accessible through API calls.

test2a commented 2 years ago

@m-niemiec awesome. MIT works. anyways, i tried your demo instance and my captcha was not recognized so i would have to train my own. how much training data/photos do i need to get a good accuracy? second, how much resources does your amazon ec2 currently use? does it require a big instance?

this is a nice project that i can use. thank you so much

test2a commented 2 years ago

@m-niemiec alright. i tried to install it on my machine. i got the zip, opened captcha renaming tool and did

pip3 install -r requirements.txt anyways, i tried to do python3 main.py

but i got the errors Traceback (most recent call last): File "main.py", line 6, in <module> from functional_view import AppFunctionalView File "/home/user/Downloads/lo test/captcha/captcha_solving_service-master/captcha_renaming_tool/functional_view.py", line 12, in <module> class AppFunctionalView(ImageModifiers): File "/home/user/Downloads/lo test/captcha/captcha_solving_service-master/captcha_renaming_tool/functional_view.py", line 65, in AppFunctionalView def get_proper_image_size() -> tuple[int, int]: TypeError: 'type' object is not subscriptable

test2a commented 2 years ago

same for

`python3 functional_view.py

Traceback (most recent call last): File "functional_view.py", line 12, in class AppFunctionalView(ImageModifiers): File "functional_view.py", line 65, in AppFunctionalView def get_proper_image_size() -> tuple[int, int]: TypeError: 'type' object is not subscriptable `

test2a commented 2 years ago

python main.py

Traceback (most recent call last): File "main.py", line 1, in import tkinter as tk ImportError: No module named tkinter

m-niemiec commented 2 years ago

@test2a Hi, sure, no problem :) I am glad that you find it useful.

For your first set of questions. In terms of dataset it really depends on complexity and type of your image. For math captcha I started getting good results between 100 and 150 images, but this captcha in general wasn't complicated and it had few variations. For the other captcha I needed around 1000 to get proper results.

Currently it is being hosted on one of the smallest EC2 instances, results are coming rather fast (around two seconds I would say) but I did not test it under heavy workload. If you would want to scale it way up, there is Semaphore set in solve_captcha.py it would be good starting point for experiments with heavier workload.

Please remember that each project has its own requirements.txt Also it should work best with Python 3.9

If your Python distribution doesn't have tkinter (most of them have it built in, but not all of them) you will need to install it separately from pip.

test2a commented 2 years ago

Python 3.8.10 oh. i am on 3.8 so maybe that is why its the problem.

would it be possible to put tkinter in requirements.txt so that the software checks it during installation?

test2a commented 2 years ago

ok. i updated python 3.8 to 3.9

now new error or the same as earlier

python3.9 main.py Traceback (most recent call last): File "/home/user/Downloads/lo test/captcha/captcha_solving_service-master/captcha_renaming_tool/main.py", line 6, in from functional_view import AppFunctionalView File "/home/user/Downloads/lo test/captcha/captcha_solving_service-master/captcha_renaming_tool/functional_view.py", line 5, in from PIL import ImageTk, Image File "/usr/lib/python3/dist-packages/PIL/ImageTk.py", line 31, in from . import Image File "/usr/lib/python3/dist-packages/PIL/Image.py", line 69, in from . import _imaging as core ImportError: cannot import name '_imaging' from 'PIL' (/usr/lib/python3/dist-packages/PIL/init.py)

test2a commented 2 years ago

oh, i am on ubuntu based distro so maybe this was built on windows/mac? im not sure

test2a commented 2 years ago

oh, i am on ubuntu based distro so maybe this was built on windows/mac? im not sure

maybe this?

m-niemiec commented 2 years ago

@test2a Yes. I was building it and working on it on MacOS and partially on Windows. I googled around and with tkinter on your system this command might help - sudo apt-get install python3-tk but please remember that I did not test that.

This stackoverflow link might also help, but if you would prefer I also built captcha renaming tool into ready to use bundles in both formats .exe and .app You can find them in folder captcha_renaming_tool under names Captcha_Renaming_Tool_APP.zip and Captcha_Renaming_Tool_EXE.zip.

test2a commented 2 years ago

great. that is what i figured. the problem is neither exe nor app will work on my linux machine. makes me wonder how did you get it running on ec2 or didn't do that part on the server. oh yeah, that makes sense.

https://stackoverflow.com/questions/64998199/cannot-import-name-imaging-from-pil

this solved my problem

sudo python3.9 -m pip install Pillow --upgrade

so. i found a folder of already mapped captcha so i checked in renaming tool. once i was finished with that, i am currently trying "Captcha solving ocr"

i am stuck with where to put the captcha. do i put it inside captcha data_sets > type a.

or do i put them inside captcha type recognizer > captcha_data_train_sets>train>type a ??


so i tried captcha_type_recognizer.py

which did nothing so when i do

python3 main.py inside captcha solving ocr folder, i get this error

Traceback (most recent call last): File "main.py", line 62, in <module> main() File "main.py", line 32, in main captcha_type_recognizer.train_recognizer_model() File "/home/user/Downloads/lo test/captcha/captcha_solving_service-master/captcha_solving_ocr/captcha_type_recognizer/captcha_type_recognizer.py", line 56, in train_recognizer_model score = model.evaluate_generator(test_set, steps=100) File "/home/user/.local/lib/python3.8/site-packages/keras/engine/training.py", line 2054, in evaluate_generator return self.evaluate( File "/home/user/.local/lib/python3.8/site-packages/keras/utils/traceback_utils.py", line 67, in error_handler raise e.with_traceback(filtered_tb) from None File "/home/user/.local/lib/python3.8/site-packages/keras_preprocessing/image/iterator.py", line 54, in __getitem__ raise ValueError('Asked to retrieve element {idx}, '

so i want to get the training done first

test2a commented 2 years ago

oh, i had the dataset of only like 20-30 captchas right now. might that be some problem?

m-niemiec commented 2 years ago

@test2a I think that it is possible to run for example exe file on Linux. You could use Wine for an instance. Yes, I did not have to use this on EC2, on AWS I hosted just the Fast API part.

I am glad that Pillow upgrade worked.

You can add new, named captchas in captcha_solving_ocr/captcha_data_sets/captcha_type_c (or other name that you prefer). After that in file main.py (from captcha_solving_ocr folder) you can make your edits to work with your captchas. I left some additional comments in there to indicate what part of code does what and what you will need to change to work on your captchas. # Name of data set to train, # Image format and rest of them.

I think that dataset of 20-30 captchas will be too small even for pretty simple patterns.