Closed damengdameng closed 3 years ago
thank you for the reply.
I tried to put my own corpus like this:
word = 'جىنپىڭ'
put_word = ''.join(reversed(word)) # for put into the img
font = ImageFont.truetype(get_rand_font(), get_rand_font_size(len(put_word)))
size = font.getsize(put_word)
but the letters on the picture are separated. it seems like Uighur should be first converted to Latin letters through the uly_char_map.
uly_char_map = {
'ﺎﺋ': { 'Type': 'vowel', 'Latin': ['a', 'A'] },
'ﺏ': { 'Type': None , 'Latin': ['b', 'B'] },
'ﭺ': { 'Type': None , 'Latin': ['ch', 'Ch'] },
'ﺩ': { 'Type': None , 'Latin': ['d', 'D'] },
'ﻪﺋ': { 'Type': 'vowel', 'Latin': ['e', 'E'] },
'ﯥﺋ': { 'Type': 'vowel', 'Latin': ['é', 'É'] },
'ﻑ': { 'Type': None , 'Latin': ['f', 'F'] },
'ﻍ': { 'Type': None , 'Latin': ['g', 'G'] },
'ﮒ': { 'Type': None , 'Latin': ['gh', 'Gh'] },
'ﮪ': { 'Type': None , 'Latin': ['h', 'H'] },
'ﻰﺋ': { 'Type': 'vowel', 'Latin': ['i', 'I'] },
'ﺝ': { 'Type': None , 'Latin': ['j', 'J'] },
'ك': { 'Type': None , 'Latin': ['k', 'K'] },
'ل': { 'Type': None , 'Latin': ['l', 'L'] },
'م': { 'Type': None , 'Latin': ['m', 'M'] },
'ن': { 'Type': None , 'Latin': ['n', 'N'] },
'ڭ': { 'Type': None , 'Latin': ['ng', 'Ng'] },
'ﻮﺋ': { 'Type': 'vowel', 'Latin': ['o', 'O'] },
'ﯚﺋ': { 'Type': 'vowel', 'Latin': ['ö', 'Ö'] },
'پ': { 'Type': None , 'Latin': ['p', 'P'] },
'ق': { 'Type': None , 'Latin': ['q', 'Q'] },
'ر': { 'Type': None , 'Latin': ['r', 'R'] },
'س': { 'Type': None , 'Latin': ['s', 'S'] },
'ش': { 'Type': None , 'Latin': ['sh', 'Sh'] },
'ت': { 'Type': None , 'Latin': ['t', 'T'] },
'ﯘﺋ': { 'Type': 'vowel', 'Latin': ['u', 'U'] },
'ﯜﺋ': { 'Type': 'vowel', 'Latin': ['ü', 'Ü'] },
# v
'ۋ': { 'Type': None , 'Latin': ['w', 'W'] },
'خ': { 'Type': None , 'Latin': ['x', 'X'] },
'ي': { 'Type': None , 'Latin': ['y', 'Y'] },
'ز': { 'Type': None , 'Latin': ['z', 'Z'] },
'ژ': { 'Type': None , 'Latin': ['zh', 'Zh'] }
}
But the Uyghur characters I got from here [https://github.com/JaidedAI/EasyOCR/blob/master/easyocr/character/ug_char.txt] is completely different from the one in uly_char_map and some items in uly_char_map seem to be composed of two letters. Can you give some suggestions?
Here is your answer.
# from lang.ug.util.convert import br_2_pf
word = br_2_pf('جىنپىڭ')
In the code, words are randomly generated and then pictures are generated. so how can I use a fixed corpus to generate data? thanks.