skotz / cbl-js

JavaScript CAPTCHA solving library
MIT License
155 stars 47 forks source link

How to improve this code? #45

Open panos78 opened 4 years ago

panos78 commented 4 years ago

Hallo, I tried to implement cbl-js for the following images: 0 1 2 3 4 5 6 7 8 9 10 11 I implemented the code below:

var cbl = new CBL(
{
    preprocess: function(img)
    {
        img.binarize(190);
        img.blur();
        img.binarize(32);
        img.colorRegions(50,true,0);
    },
    character_set: "ABCDEFGHIJKLMNOPQRSTUVWXYZ0123456789",
    blob_min_pixels: 50,
    blob_max_pixels: 400,
    pattern_width: 25,
    pattern_height: 25,
    perceptive_colorspace: true,
});

Any idea how to improve it?

skotz commented 4 years ago

It looks like the colors are pretty distinct, so I'd play around with removing the binarize methods and skipping straight to the colorRegions. Making each letter a different color is the primary weakness of this CAPTCHA, so you don't want to make it black and white if you can help it.

The third parameter is the "pixel jump" which you can set higher to effectively keep letters together even when there's a line through them. Maybe try something like img.colorRegions(5, true, 1) for starters.

There's a lot going on here with fonts, character counts, and colored background blobs, so it'll be hard to get a high accuracy.

Good luck.

panos78 commented 4 years ago

I removed the binarize but without it there was nothing to identify, then I tried to play around with more than 200 different combinations of colorRegions and just with the first image to see if I can manage to solve it but no luck. With the following code:

img.binarize(155);
img.colorRegions(16,true);

I created a new model which gives the following: εικόνα which is wrong but not completely wrong as it identifies all the characters + two extra A. And now I am stuck and don't know how to continue.

skotz commented 4 years ago

Honestly it'll be hard to get high accuracy on this CAPTCHA with the methods in this library. There's a lot of variation. To improve the results you might need to write something more custom to specifically deal with the constants: background circles and 1px foreground lines. The letters themselves seem to be from a finite set of fonts, and they're rotated but not distorted. That might help in some way.

Sorry, I won't be able to help too much more on this one.