This project is focused on decoding of audio captchas, which often might be much easier then breaking visual captchas. It was tested on uloz.to audio captchas (as of April 2018)
Code is hosted on https://github.com/izderadicka/adecaptcha
( Old version https://code.launchpad.net/~ivan-zderadicka/adecaptcha/trunk)
Some description (of older version) is here: http://old.zderadicka.eu/adecaptcha.php
It's still Python2, so make sure python2 is installed (which is not by default in new distributions, as it's EOLed)
git clone https://github.com/izderadicka/adecaptcha.git
cd adecaptcha/
sudo pip2 install cython numpy scipy lxml
python2 setup.py build_ext --inplace
cd adecaptcha/libsvm-3.17/
make clean make lib
cd ..
rm svm;ln -s libsvm-3.17/python/ svm
cd ..
sudo mv adecaptcha/ /usr/local/lib/python2.7/dist-packages/
alternatively numpy,scipy and cython could be installed via pip however pyao and pymad should be installed from source:
sudo apt-get install libmad0-dev git clone --depth 1 https://github.com/jaqx0r/pymad cd pymad/ python2 config_unix.py sudo python2 setup.py install
cd ..
wget -O - http://ekyo.nerim.net/software/pyogg/pyao-0.82.tar.gz | tar xzv sudo apt-get install libao-dev wget -O - https://launchpad.net/ubuntu/+archive/primary/+files/pyao_0.82-5build1.debian.tar.gz| tar xzv patch -p0 -i debian/patches/driver_id.patch patch -p0 -i debian/patches/int_format_strings.patch patch -p0 -i debian/patches/python25.patch cd pyao-0.82/ python config_unix.py python setup.py install
Download enough samples ~500 should be enough (if sample contains 4 letters), load them in one directory sound and picture must have same base name (xyz.wav and xyz.gif)
Start python samplestool.py (requires also pyqt4, pyao, matplotlib modules). There set:
Directory with samples - to your downloaded samples
Segmentation algorithm - select one, Simple Energy Envelope - this one uses sum of squares of amplitude within small window, divided by window size, scaled to values 0-100 Simple Very Naive - some old segmentation - basicaly looking for amplitude to step over threshold and then taking size_sec sample - or something like this - actually not very useful Fixed position - just cutoff fixed positions from sound signal Or you can create relativelly easily your own segmentation - just look into segmentation.py for samples
Threshold for segmentation - threshold to distinguish some sound from silence/background noise - value is individual for each algorithm For Simple Energy Envelope you can click "Show Energy Envelope", this will show chart of energy envelope for current sample - try few samples and you will get idea what right value should be (segments have also added a small offset before and after they reach the threshold)
Segmentation params - specific for given algorithm - default values are shown
Number of freq. bins - number of frequency spectrum bins to which signal is divided - this determines size of feature vector for each letter sound
Check "Play sound when move to next item" - it'll help to work with samples
Then click Next button and it opens first sample, go through several samples to set segmentation data correctly.
When you think segmentation is set properly, check "Analyze segmentation initially" - it then calculates number of segments for all samples and prints some stats - ideally all samples should have same number of segments.
If there is some initial/final sound, that is not part of captcha text, you can set what segments should be considered as a part of captcha (supports python indexing - so end = -4 means last 4 segments)
Now you need to transcribe all samples - write letters corresponding for current captcha and click next. It's bit dull job but should not take much more then an hour for 500 samples.
If all samples are transcribed you can generate training data for SVM classificator and save segmentation configuration. Click "Generate Training Data" button. After clicking file dialog is opened - save data within 'libsm-3.17/tools' directory say as 'xxx.data'. Progress dialog appears so wait until the process ends.
Before creating model from your data assure that you have packages gnuplot and gnuplot-x11 installed (they are used by training tools).
First we need to split data to training and test sets (test set will be used to measure classificator accuracy) - for test data we can dedicate about 10% of all data set, that's 200 (500 4 0.1) in this case.
cd libsm-3.17/tools ./subset.py -s 1 xxx.data 200 xxx.test xxx.train
now we can do the training of the SVM classifier with our data: ./easy.py xxx.train xxx.test
this can take quite some time - up to half an hour, so keep it running (you'll see some charts from gnuplot) See result - specially check of classification accuracy, in ideal case you should see: Accuracy = 100% (200/200) (classification), but 90% is good if audio captchas are noisy. (with 90% chance on one letter you have app. 66% change to recognize 4 letters captcha - and for 3 attempts you have 96% chance to succeed)
after this we have all we need, just copy cp xxx.cfg ../../xxx.cfg cp xxx.train.range ../../xxx.range cp xxx.train.model ../../xxx.model
cd ../..
check that xxx.cfg - contains keys for model and range files: { ...
, 'range_file':'xxx.range', 'model_file':'xxx.model' }
and test if it works: ./adechaptcha.py xxx.conf "http://some_server/audio_captcha.wav"
Thanks to authors of all these great tools and libraries: numpy, scipy, pymad, pyao, pyqt4, pyload, matplotlib and libsvm
History: 4-Apr-13 - Fixed issues with libsvm seg fault when called from multiple threads 12-Aug-14 - updated with some fixes - support of wav, new segmentation algorithm, pyload plugin update, ... 16-Apr-2018 - updated to work with Ubuntu 16.04 and recent libraries, improved segmentation algorithm and samlestoop.py