skotz / cbl-js

JavaScript CAPTCHA solving library
MIT License
155 stars 47 forks source link

Help with this one? #62

Open ryan-nauman opened 3 years ago

ryan-nauman commented 3 years ago

This is my first time checking out this library. I'm still learning and am trying to determine the meaning of blob vs pattern (in relation to character vs image, if any). alawiggle has 200x70px images but a blob_max_pixels defintion of 1000. The character '5' is 17x30px but the pattern width/height is 24x24?

It may be best to just learn by example. I've got several captcha examples but they are all pretty similar (either primarily black or primarily white).

GCFK H64L 3VKE L96I M16H

I'll keep experimenting. Thanks for your help!

skotz commented 3 years ago

I'm using the term "blob" to include any collection of pixels in the image that are connected to each other, excluding pixels within the bounding rectangle that aren't part of the shape.

So if this was your CAPTCHA:

image

You could crop out the S like this:

image

But it has pieces of other stuff in there. When you extract a "blob" of connected pixels, you get this instead:

image

It removes noise and makes it easier to process. You can output the debug image after coloring the blobs to see each blob in a different color. It helps you see how it's classifying blobs.

image

The pattern size is what the library will resize every character image to so that they're consistent. It's a lot easier to compare an image to a trained template when they're both the exact same size. So even if letters are 50x50 in your image, it's usually wise to pick something smaller for the trained samples for performance and accuracy.

Pixels per blob uses the character size before resizing and counts the number of pixels in the blob itself. This is useful if you have a CAPTCHA with a lot of noise and you just want to remove any small specs.

When some images are inverted and other are not, you can try to detect the background color as in this example.

ryan-nauman commented 3 years ago

Thanks that helps. Do you see a simple approach to solving the captchas I posted without using a custom solution/external image preprocessing?