Closed ajason6208 closed 7 years ago
I learned from his program to use CTC multiple data also. And It works well.
I do not know why should change label into Sparse representation.
Because, Tensorflow's CTC api requires SparseTensor as label. Conectionist Temporal Classification (CTC) | TensorFlow
I do not know how could I create an array to save all of feature because each frame is different of different file,
It is difficult to explain in a word for me. So, I will show you my code snippet
timestep_factor = 1000 # maximum frames of features
zero_features = np.array([0] * num_features)
zero_features = zero_features.reshape(1, num_features).tolist()
train_skip_idx = []
train_seq_len = []
train_inputs = []
i = 0
for l in audio_filenames: # audio_filenames is a list of wav file
i+=1
audio, sr = librosa.load(l, mono=True)
inputs = librosa.feature.melspectrogram(audio, sr=sr, n_mels=num_features)
inputs = inputs.transpose((1, 0))
inputs = inputs.tolist()
if len(inputs) < timestep_factor:
train_seq_len.append(len(inputs))
tmp_list = zero_features * (timestep_factor - len(inputs))
inputs.extend(tmp_list)
train_num += 1
else:
train_skip_idx.append(i-1)
continue
train_inputs.append(inputs)
train_labels = []
for i, l in enumerate(targets_line):
if i in train_skip_idx:
continue
phones = l.split(' ')
phones = list(filter(('').__ne__, phones))
train_labels.append([phonemes[x] for x in phones])
# if you want to save features, you can do as follows
np.save("train_inputs.npy", train_inputs)
np.save("train_targets.npy", train_labels)
np.save("train_seq_len.npy", train_seq_len)
# ----
train_targets = []
for mb_i in range(int(train_labels.shape[0]/mini_batch_size)):
train_targets.append(sparse_tuple_from(train_labels[mb_i*mini_batch_size(mb_i+1)*mini_batch_size], num_classes))
# snip
for mb_num in range(num_batches_per_epoch):
feed = {inputs: train_inputs[mb_num*mini_batch_size:(mb_num+1)*mini_batch_size],
targets: train_targets[mb_num],
seq_len: train_seq_len[mb_num*mini_batch_size:(mb_num+1)*mini_batch_size]}
batch_cost, _ = session.run([cost, train_op], feed)
train_cost += batch_cost
train_ler += session.run(ler, feed_dict=feed)
train_cost /= num_batches_per_epoch
train_ler /= num_batches_per_epoch
# snip
For my understanding, if you have a question, you could ask me.
Hi, @ajason6208. Thank you for the question.
Answering your questions.
I do not know why should change label into Sparse representation.
I changed the labels to sparse representation because the ctc_loss require that, it's in the documentation.
In my case, I extract MFCC feature 14 dimension , but I have 8440 training data. I do not know how could I create an array to save all of feature because each frame is different of different file, Please help me tks.
There are several approach to do that. You can append zeros and create a tensor N x max_timesteps x n_features
for all your training data, but this will require more memory; you can create buckets (this is done in one example made by Tensorflow team); you can read N_per_batch
audio files, generate the features and create each batch dynamically, append the zeros to the batch data; or you can save all your training data in one matrix T x n_features
where you append each training data by the timestep, creating a giant matrix and keep the record of each training data timesteps, and then read it accordingly.
I'll commit a code demonstrating one of these methods. Stay tune.
I made a commit f2f935e6b1906df2543b4ed794286427870995d5 modifying the original code to support multiple data as input.
Close #8
In your program , you have one wav file and you extract the MFCC feature 13 dimension => train_inputs. Second, you construct a label array like this [19 8 5 ...] and change it to Sparse representation ( function : sparse_tuple_from ) . I do not know why should change label into Sparse representation.
In my case, I extract MFCC feature 14 dimension , but I have 8440 training data. I do not know how could I create an array to save all of feature because each frame is different of different file, Please help me tks.
I like your example for your ctc neural, tks you give us an useful code.