Support for Chinese Language

Hi, thanks for your interest in our work.

About FANnet:

At this point it's a bit difficult to replace text written in one language to another. The reason is that we've assumed a character to character translation but not word to word. So it expects source and target texts are equal in length (i.e. same number of characters). Translating a word to another language may not result in a word with same character count. Effectively we want a character to character mapping. This is a major limitation of the current approach as discussed in the paper. But you can still experiment with numeric values 0-9 in different languages where a one-to-one mapping is possible.

However, the current code needs some modifications before you will be able to use such scheme. For example assume the following translations from English to Chinese numerals 0-9:

Note that we are ignoring the fact that Chinese numbers can extend beyond 9.

0 -> 〇
1 -> 一
2 -> 二
3 -> 三
4 -> 四
5 -> 五
6 -> 六
7 -> 七
8 -> 八
9 -> 九

If we cannot use ASCII values as filenames then we have to use some kind of indexing. Assume our filenames for English numeral images as en0.jpg, en1.jpg, ... , en9.jpg and the same for Chinese numeral images as cn0.jpg, cn1.jpg, ... , cn9.jpg. Also assume filenames for test pairs as 00_en0_cn0.jpg, 01_en1_cn9.jpg etc. Now make the following changes in fannet.py:

Lines 48 and 49:

SOURCE_CHARS = [f'en{i}' for i in range(10)]
TARGET_CHARS = [f'cn{i}' for i in range(10)]

Lines 106 and 107:

ch_src = str(perm[0])
ch_dst = str(perm[1])

Line 221:

idx_ch = self._charset.find(dst_ch)

Line 361:

charset=TARGET_CHARS,

PLEASE NOTE: I haven't checked this personally. So, some other minor issues may appear during training. I would like to provide a fully working notebook in future. But unfortunately for the next couple on months I won't be able to do so and I might be slow to respond.

About Colornet

Colornet doesn't depend on structure of the involved characters. So you might be able to use the provided pretrained weights without retraining! ;) But if you still want to train with new data then you should prepare your data in a format similar to the given dataset.

prasunroy / stefann

Support for Chinese Language #14

About FANnet:

About Colornet