louiskirsch / speechT

An opensource speech-to-text software written in tensorflow
Apache License 2.0
157 stars 36 forks source link

Always getting decoded value blank #19

Closed jkaewprateep closed 6 years ago

jkaewprateep commented 6 years ago

I try to adapt to use this code with LSTM network by changing network to LSTM by duplicate class Wav2LetterModel and change model to LSTM. After train 10,000 samples for 4,000 round decoded value always return blank. Please help.

class Wav2LetterLSTMModel(SpeechModel): #Add Sep 14, 2017 to create LSTM model

def init(self, input_loader: BaseInputLoader, input_size: int, num_classes: int): super().init(input_loader, input_size, num_classes)

def _create_network(self, num_classes):

cellsize = 64
num_layers = 3

inputs = self.inputs       
inputs, sequence_lengths, labels = self.input_loader.get_inputs() 

XT = tf.transpose(inputs, [1, 0, 2])  # permute time_step_size and batch_size
XR = tf.reshape(XT, [-1, self.input_size]) # each row has input for each lstm cell (lstm_size=input_vec_size)
X_split = tf.split(XR, cellsize, 0) # split them to time_step_size (arrays)

lstm = rnn.BasicLSTMCell(cellsize, forget_bias=0.5, state_is_tuple=True)
outputs, _states = rnn.static_rnn(lstm, X_split, dtype=tf.float32)

return tf.transpose(outputs, (1, 0, 2))
louiskirsch commented 6 years ago

Sorry this project is meant to use CNNs and the CTC loss function. I think stack overflow is a better place to ask your question. e.g. https://stackoverflow.com/questions/40812339/how-to-train-an-lstm-for-speech-recognition