vrenkens / nabu

Code for end-to-end ASR with neural networks, build with TensorFlow
MIT License
108 stars 43 forks source link

phoneme vs. grapheme based? #35

Closed mjhanphd closed 6 years ago

mjhanphd commented 6 years ago

Hi. If I'd like to run the LAS model to use graphemes (i.e. English characters) directly as output units, is it enough to set 'alphabet' in text_processor (plus all the other things that define 'alphabet') to "A B C ... Z"? Thank you.

vrenkens commented 6 years ago

You should also adjust the datafiles fields in the database.conf. They should point to the files containing textual transcriptions

vrenkens commented 6 years ago

You should also create a text normalizer and use it in your text_processor. Look into nabu/processing/target_normalizers for more info :)