About the parameter settings of Captcha Synthesizer

ziqiangchen commented 5 years ago

Hello! bro, I have seen your paper, I have the question about the paramter settings of the Captcha Synthesizer, I don't have seen the parameter settings implement in the code ,,So, I think you have paste the captcha in the white image, and the roate angle, or the color change parameter is trained by the generator network ?is it correct ? or you have implement the parameter settings by other method? thx....

SongyiGao commented 5 years ago

I have the same confusion. I wish know how to use the parameter control the data generator，for example，if I wish get the image with noise background. How does the discriminator generate images with noise? Is it because the real data have the lables(Noisy background,Occluding lines,Distortion)?

yeguixin commented 5 years ago

I used an image generator which can paste each character in the white image. Here every character is also a small image. All parameters such as rotate angle, occluding lines and etc are trained at this step. We also have another gengrator which is part of GAN and it aims to modify the generated captcha at pixel level to make sure that the generated captchas are similar to the real ones.

yeguixin commented 5 years ago

Sorry, I reopened this issue.

awsssix commented 5 years ago

@yeguixin You use only 500 real captchas,and Captcha Image Generator would generator 500 captchas or more?

scut-salmon commented 5 years ago

If each character is a small image, then how do you consider the distance between them, do you think the distance of all characters in real image are equal? And the most confusing thing is that you train all the parameters, do you mean train by neural net work? Or you just set this parameters manually?

Looking forward to your kindly reply, thanks!

yeguixin commented 5 years ago

@awsssix We can synthesize as many captchas as we want as we generate using an traditional captcha generator. The 500 real captchas and the synthetic captchas are used to train the GANs. Here is a traditional captcha generator which maybe contribute to understand how to generate captchas.

@scut-salmon The distance between two characters is random within a certain range such as [0, 20]pixels. Our sythesizer can automatically tune the boundariy of the range. Note that some parameters such as background, number of occluding lines are fixed according to the captcha scheme. In our initial experiments we found that manually set the synthetic parameters can also performs well.

awsssix commented 5 years ago

@yeguixin Thank you very much! In the article, you use grid search method to search for the optimal parameters. I still confuse with MADS. Q1:Befor using traditional captcha generator, are these parameters initialized first? Q2:Can you give us a more detailed introduction, or some links? Q3:How does MADS determine whether these parameters are optimal?(if compare with real captchas, Compared to what) Best wishes!

yeguixin commented 5 years ago

The parameters are initialized at first. After that, the captcha generator synthesizes captchas and then the generator of GANs tries to tune the synthetic captchas. At last, the discriminator will distinguish the generated captchas from the real ones. If the discriminator successful identify the synthetic captchas (Here is a threshold to determine the discriminating ability), the value of parameters will be tuned. Here the value increases iteratively because we set the initial value relatively small. In order to determine the optimal parameters, there are many training tricks such as tuning parameters per 10 iternations. If the discriminator successfully identify the synthetic captchas at most times, the parameters will be tuned. In fact, the discriminator determines fake or true by comparing the patch images between the real captchas and sythetic captchas. Here you can refer to the PachGANs

Hope that the above will be helpful.

scut-salmon commented 5 years ago

excuse me, could you please offer some sample of real captcha and the correspoding synthetic captcha generated by the captcha generator(without security feature)

yeguixin commented 5 years ago

@scut-salmon I plan to publish a runable version.

Ru7z commented 5 years ago

This repo should be very helpful to you. @scut-salmon @awsssix

jiyonghe commented 5 years ago

here is used 1500+ real captchas,the image's background has noise, after the model was trained,when i predict the new captcha,the Recognization Accytacy is zero

dhitaj commented 5 years ago

@yeguixin when is the runable version scheduled for release?

Times125 commented 5 years ago

when is the runable version scheduled for release? @yeguixin

thograce commented 5 years ago

In the first step, the captchas generator generates the color captchas through the input characters. Can I think that if all characters have the same color, it is a captcha without safety features, and if the characters have different colors, it is a safety feature？ @yeguixin

yeguixin commented 5 years ago

In general, a single color can eaisly be removed by using some image preprocessing methods. Different character colors increases the difficulty of the preprocessing. In our paper, we just summaried and categoried six kinds of security features for better description.

thograce commented 5 years ago

In general, a single color can eaisly be removed by using some image preprocessing methods. Different character colors increases the difficulty of the preprocessing. In our paper, we just summaried and categoried six kinds of security features for better description.

So if I want to generate the synthetic captchas with different colors by GANs, I can input some captchas with a single color and set the different colors(included in the fifth security feature) as a parameter of training. The characters of my real captchas are all different colors. Am I right?

yeguixin commented 5 years ago

When generating captchas, you can random set different colors for each characters by changing the RGB value.

thograce commented 5 years ago

When generating captchas, you can random set different colors for each characters by changing the RGB value.

Do you mean to set color parameters in traditional generators or in GANs? According to your previous reply, multiple colors should be considered as a security feature, while traditional generators should generate relatively clean captchas(single color) to the generator in GANs. Thank you for your reply.

yeguixin commented 5 years ago

Yes, the parameters of security features are setted in the traditional captcha image generator. Once trained, the traditional captcha image generator can generate captchas with and without security features for a targeted captcha scheme.

thograce commented 5 years ago

Thank you very much for your reply and it has helped me a lot. But I may have to bother you. How many data pairs did you use to train the pix2pix model in the preprocessing step? Did you combine two pictures into one?

yeguixin commented 4 years ago

We use 20K pairs of synthesized captcha images to train the preprocessing model. The data format strictly follows Pix2Pix model. To do so, we first resize the captcha image to 256*256pixels and then combine two images into one as the following example.

yeguixin / captcha_solver

About the parameter settings of Captcha Synthesizer #7