skotz / captcha-breaking-library

Neural network, contour analysis, bitmap vector subtraction CAPTCHA solving library and scripting language with perceptive color space segmentation
GNU General Public License v3.0
81 stars 25 forks source link

Bitmap Subtraction Solver #8

Closed Alexufo closed 8 years ago

Alexufo commented 8 years ago

Can you explain how BVS work?

skotz commented 8 years ago

BVS stands for Bitmap Vector Subtraction. Here's a quick example...

Let's say you collected two samples of the number 3. These images can be represented as a vector (or matrix) of numbers where a 1 is a white pixel and a 0 is a black pixel.

[ 1 1 1 0 0
  0 0 0 1 0
  0 0 1 1 0
  0 0 0 1 0 
  0 1 1 0 0 ]
[ 0 1 1 1 0
  0 0 0 1 0
  0 1 1 0 0
  0 0 0 1 0 
  0 1 1 1 0 ]

We can merge these samples together to create a pattern. The cells basically get averaged together, cell by cell. The averaged matrix from the above patterns would be...

[ 0.5 1 1 0 0
  0 0 0 1 0
  0 0.5 1 0.5 0
  0 0 0 1 0 
  0 1 1 0.5 0 ]

Here's an actual pattern composed of a dozen or so samples of an 8. It represents how the average number 8 looks in a CAPTCHA.

ztest_8_8

Once we have a merged pattern for every possible character in our CAPTCHA, we can check those patterns against segmented characters from an unknown CAPTCHA. We segment out a letter, normalize the size (so the input image has the same dimensions as the pattern), and then compare it to all known patterns. The pattern that matches the input most closely wins and is chosen as the solution for that input.

Does that help any?

Alexufo commented 8 years ago

Absolutely, thank you.