lksshw / SRNet

A pytorch implementation of the SRNet architecture from the paper Editing text in the wild (Liang Wu et al.)
152 stars 35 forks source link

SRNet

Update (15th Janurary 2022): Paths to download data-files have been updated.

Update (27th August 2020) :

A bug related to variable image size is fixed. You can now train with variable image sizes. This will improve generations significantly.

Training is now significantly faster. Pull all changes and train as usual.

Update (26th July 2020) :


This repository presents SRNet (Liang Wu et al), a neural network that tackles the problem of text editing in images. It marks the inception of an area of research that could automate advanced editing mechanisms in the future.

SRNet is a twin discriminator generative adversarial network that can edit text in any image while maintaining context of the background, font style and color. The demo below showcases one such use case. Movie poster editing.

L - Source ; R - Modified


Architecture changes

This implementation of SRNet introduces two main changes.

  1. Training: The original SRNet suffers from instability. The generator loss belies the instability that occurs during training. This imbalance affects skeleton (t_sk) generation the maximum. The effect manifests when the generator produces a sequence of bad t_sk generations, however instead of bouncing back, it grows worse and finally leads to mode collapse. The culprit here is the min-max loss. A textbook method to solve this problem is to let the discriminator always be ahead of the generator. The same was employed in this implementation.

  2. Generator: In order to accomodate for a design constraint in the original net, I have added three extra convolution layers in the decoder_net.

Incorporating these changes improved t_sk generations dramatically and increased stability. However, this also increased training time by ~15%.


Usage

A virtual environment is the most convenient way to setup the model for training or inference. You can use virtualenv for this. The rest of this guide assumes that you are in one.

This repository provides you with a bash script that circumvents the process of synthesizing the data manually as the original implementation does. The default configuration parameters set's up a dataset that is sufficient to train a robust model.

If you wish to synthesize data with different fonts, you could do so easily by adding custom .ttf files to the fonts directory before running datagen.py. Examine the flow of data_script.sh and change it accordingly.

Training

If you are interested in experimenting, modify hyperparameters accordingly in cfg.py

Prediction

In order to predict, you will need to provide a pair of inputs (The source i_s and the custom text rendered on a plain background in grayscale (i_t) -examples can be found in SRNet/custom_feed/labels-). Place all such pairs in a folder.

Pre-trained weights

You can download my pre-trained weights here

Some results from the example directory:

Source Result

Demo

Code for the demo is hastily written and is quite slow. If anyone is interested in trying it out or would like to contribute to it, open an issue, submit a pull request or send me an email at lakshwin045@gmail.com. I can host it for you.

References