llSourcell / tensorflow_speech_recognition_demo

This is the code for 'How to Make a Simple Tensorflow Speech Recognizer' by @Sirajology on Youtube
384 stars 248 forks source link

little change to the project, then it works well #26

Open chengqingshui opened 6 years ago

chengqingshui commented 6 years ago

Hello, everyone! I run the project, and found some problem, and change code as bellow, then it works well. 1, about the mfcc_batch_generator funtion, which generate a batch of sound feature data and labels, but in the trainning step, the data is not updated by the next() in the loop. So I add a new function mfcc_batch_generatorEx simaliar to mfcc_batch_generator in speech_data.py file:
def mfcc_batch_generatorEx(batch_size=10, source=Source.DIGIT_WAVES, target=Target.digits): maybe_download(source, DATA_DIR) if target == Target.speaker: speakers = get_speakers() batch_features = [] labels = [] files = os.listdir(path)

print("loaded batch of %d files" % len(files))
shuffle(files)
for wav in files:
    if not wav.endswith(".wav"): 
        continue
    wave, sr = librosa.load(path+wav, mono=True)
    if target==Target.speaker: 
        label=one_hot_from_item(speaker(wav), speakers)
    elif target==Target.digits:  
        label=dense_to_one_hot(int(wav[0]),10)
    elif target==Target.first_letter:  
        label=dense_to_one_hot((ord(wav[0]) - 48) % 32,32)
    else: 
        raise Exception("todo : labels for Target!")
    labels.append(label)
    mfcc = librosa.feature.mfcc(wave, sr)
    # print(np.array(mfcc).shape)
    mfcc = np.pad(mfcc,((0,0),(0,80-len(mfcc[0]))), mode='constant', constant_values=0)
    batch_features.append(np.array(mfcc))
return batch_features, labels 

2、 in demo.py file, generate all sound features and labels by using bellow X, Y = speech_data.mfcc_batch_generatorEx(batch_size) 3、 in the training step, using code bellow: with tf.Session() as sess: model.fit(trainX, trainY, n_epoch=training_iters)#, validation_set=(testX, testY), show_metric=True,batch_size=batch_size) _y = model.predict(X) YY = [x.tolist() for x in Y] corrent_prediction = tf.equal(tf.arg_max(_y,1), tf.arg_max(YY,1)) accuracy = tf.reduce_mean(tf.cast(corrent_prediction, tf.float32)) print("\n\ncorrent_prediction = " , sess.run(accuracy) )

model.save("tflearn.lstm.model")

chengqingshui commented 6 years ago

Hello, everyone!

I run the project, and found some problem, and change code as bellow, then it works well. 1, about the mfcc_batch_generator funtion, which generate a batch of sound feature data and labels, but in the trainning step, the data is not updated by the next() in the loop. So I add a new function mfcc_batch_generatorEx simaliar to mfcc_batch_generator in speech_data.py file:

def mfcc_batch_generatorEx(batch_size=10, source=Source.DIGIT_WAVES, target=Target.digits): maybe_download(source, DATA_DIR) if target == Target.speaker: speakers = get_speakers() batch_features = [] labels = [] files = os.listdir(path)

print("loaded batch of %d files" % len(files))
shuffle(files)
for wav in files:
    if not wav.endswith(".wav"): 
        continue
    wave, sr = librosa.load(path+wav, mono=True)
    if target==Target.speaker: 
        label=one_hot_from_item(speaker(wav), speakers)
    elif target==Target.digits:  
        label=dense_to_one_hot(int(wav[0]),10)
    elif target==Target.first_letter:  
        label=dense_to_one_hot((ord(wav[0]) - 48) % 32,32)
    else: 
        raise Exception("todo : labels for Target!")
    labels.append(label)
    mfcc = librosa.feature.mfcc(wave, sr)
    # print(np.array(mfcc).shape)
    mfcc = np.pad(mfcc,((0,0),(0,80-len(mfcc[0]))), mode='constant', constant_values=0)
    batch_features.append(np.array(mfcc))
return batch_features, labels 

2、 in demo.py file, generate all sound features and labels by using bellow

X, Y = speech_data.mfcc_batch_generatorEx(batch_size)

3、 in the training step, using code bellow:

with tf.Session() as sess: model.fit(trainX, trainY, n_epoch=training_iters)#, validation_set=(testX, testY), show_metric=True,batch_size=batch_size) _y = model.predict(X) YY = [x.tolist() for x in Y] corrent_prediction = tf.equal(tf.arg_max(_y,1), tf.arg_max(YY,1)) accuracy = tf.reduce_mean(tf.cast(corrent_prediction, tf.float32)) print("\n\ncorrent_prediction = " , sess.run(accuracy) )

model.save("tflearn.lstm.model")

Sorry to the bad format, Thanks for all !

AiJunXian commented 6 years ago

Hi. I noticed that you ignored the Validation set in Model.fit function. Why? Can you please describe to me what's the exact process inside Model.fit function? Thanks

raffaelbd commented 5 years ago

IndentationError: expected an indented block. i find this error