igormq / ctc_tensorflow_example

CTC + Tensorflow Example for ASR
MIT License
312 stars 183 forks source link

Multiple Data #14

Closed TsainGra closed 7 years ago

TsainGra commented 7 years ago

I want to train the model on a small timit dataset, how do I change the inputs for the same? Also, I want the output to be text, rather than numbers.

igormq commented 7 years ago

Hi @TsainGra,

you can start visiting the Tensorflow's website that explain different ways to read a data. This is a good start point.

Answering you last question, you can convert back to text doing the invert mapping. Here an example

def num2text(sequence):
    dic = dict(zip(np.arange(ord('z') - ord('a') + 1 + 1 + 1), [chr(i) for i in np.arange(ord('a'), ord('z') + 1)] + [' ', '<b>']))
    return ''.join([dic[s] for s in sequence])

This simple code will decode the sequence, however, be aware that it will not remove the blank label nor remove the repetitions.

igormq commented 7 years ago

@TsainGra , can I close this issue?