Open ghost opened 6 years ago
From the paper: "The adversarial examples in this paper were developed for the latest version of Tesseract, a popular open-source OCR system based on deep learning. They do not transfer to the legacy version of Tesseract, which employs character-based recognition".
So using both OCR engines might help against such adversarial text images.
I was not able to reproduce the results from the article. It would be good to get the original images, software version and traineddata which were used.
My question: is there a way to defend against such thing?
My guess is that you can solve it by having a certain percent of the images in the training dataset that include this filter.
I was not able to reproduce the results from the article. It would be good to get the original images, software version and traineddata which were used.
CC: @csong27 (Main author of the said paper)
other than exposures "-3 -2 -1 0 1 2 3"
what other commands add degradation to the generated images?
The 'exposures' are meant to be used to train the legacy engine.
For lstm training see #1052.
@amitdo
exposures "-x+1 ..."
?degradeimage.h
or it's various applications in text2image
.text2image
should already do that:
$ text2image --help|grep -i degrade
--degrade_image Degrade rendered image with speckle noise, dilation/erosion and rotation (type:bool default:true)
@stweil,
With --degrade_image
this function will be called:
// Degrade the pix as if by a print/copy/scan cycle with exposure > 0
// corresponding to darkening on the copier and <0 lighter and 0 not copied.
// If rotation is not nullptr, the clockwise rotation in radians is saved there.
// The input pix must be 8 bit grey. (Binary with values 0 and 255 is OK.)
// The input image is destroyed and a different image returned.
struct Pix* DegradeImage(struct Pix* input, int exposure, TRand* randomizer,
float* rotation);
but together with the new lstm code this new function appeared:
// Creates and returns a Pix distorted by various means according to the bool
// flags. If boxes is not nullptr, the boxes are resized/positioned according to
// any spatial distortion and also by the integer reduction factor box_scale
// so they will match what the network will output.
// Returns nullptr on error. The returned Pix must be pixDestroyed.
Pix* PrepareDistortedPix(const Pix* pix, bool perspective, bool invert,
bool white_noise, bool smooth_noise, bool blur,
int box_reduction, TRand* randomizer,
GenericVector<TBOX>* boxes);
Ray said about PrepareDistortedPix() (newer method):
It is used internally at Google. Text2image could be modified to use it too.
I think PrepareDistortedPix()
is similar to the degradations methods ocropy uses.
Do you mean, when training Tesseract 4 lstm there is no use of using exposures "-x+1 ..."?
You can use it, but it seems that the newer method is more suitable for the lstm model.
Currently, there is no code that actually calls the newer method.
Currently, there is no code that actually calls the newer method. :sunglasses: @theraysmith I invoke your presence
He on the beach right now... :sunglasses:
There is exactly one call to PrepareDistortedPix() internally at Google.
@jbreiden, what's the value of box_reduction
?
Any updates regarding Data Augmentation
?
Nvidia labs have some interesting implementation of data degradation https://github.com/tmbdev/das2018-tutorial/blob/master/40-augmentation.ipynb https://github.com/NVlabs/ocrodeg https://github.com/NVlabs/ocropus3
Info about Calamari OCR at https://github.com/tesseract-ocr/tesseract/issues/1782#issuecomment-411018986
Thanks @christophered for the info.
An interesting research paper have been released by Nvidia called Noise2Noise
, it shows a new method of cleaning and de-noising images using a model trained on noised images only, not clean ones, it's amazing!
This means that such model could actually understand What noise is.
@theraysmith @stweil @egorpugin @amitdo Do you see it ever be implemented in Tesseract?
https://www.youtube.com/watch?v=P0fMwA3X5KI
https://arxiv.org/pdf/1803.04189.pdf
https://news.developer.nvidia.com/ai-can-now-fix-your-grainy-photos-by-only-looking-at-grainy-photos/
If implemented in Tesseract, this would mean that we wont be needing to add noise or degradation to our data while training, Because Tesseract would already have it's own Model to recognize noise and degradation, Tesseract would understand the concept of noise.
Recently, I have read a research paper called
Fooling OCR Systems with Adversarial Text Images
Basically, it states that making minor changes to an image could hinder & fool the ocr engine. They used Tesseract 4 as an example.My question: is there a way to defend against such thing? @theraysmith @amitdo @egorpugin @Shreeshrii @stweil