pannous / tensorflow-speech-recognition

🎙Speech recognition using the tensorflow deep learning framework, sequence-to-sequence neural networks
Other
2.16k stars 642 forks source link

Train data is used to determine accuracy in dense_layer #28

Open thubregtsen opened 7 years ago

thubregtsen commented 7 years ago

I'm trying to use dense_layer. Dense_layer uses spectro_batch_generator from speech_data.py to fetch batches of data. Here it is already noted, that training and testing/validation set needs to be split # shuffle(files) # todo : split test_fraction batch here!

A bit further in dense_layer, the function train from layer/net.py is used. In the train function, currently around line 389, there is:

  feed_dict = {x: batch_xs, y: batch_ys, keep_prob: dropout, self.train_phase: True}
  loss,_= session.run([self.cost,self.optimizer], feed_dict=feed_dict)

Immediately followed by:

  if step % display_step == 0:
    # Calculate batch accuracy, loss
    feed = {x: batch_xs, y: batch_ys, keep_prob: 1., self.train_phase: False}
    acc , summary = session.run([self.accuracy,self.summaries], feed_dict=feed)

If I understand it correctly (and I am new to this, so it's likely that I am wrong), the data is first fed into the train step, after which the exact same data is used to determine the accuracy.

pannous commented 7 years ago

well observed: this is a (trivial) todo

thubregtsen commented 7 years ago

That is good to hear :) Here are some further thoughts on it:

I see three options to split the dataset:

I think completely random would not be far, as you would train on for instance "0_Agnes_100.wav.png.tiny" and "0_Agnes_140.wav.png.tiny", and test on "0_Agnes_160.wav.png.tiny". More realistic would be to train on a certain set of people, and test on a different set of people, as this is closer to the use-case, right (new person using this app)? What is your view?

pannous commented 7 years ago

spitting by person seems to be a good idea to test true generalization.

A completely random test set would probably yield pretty much identical results as the remaining training set, once we create samples for all rates between 60 61 62 ... and 400. we can still evaluate a completely random test set, just to confirm that the network gets the basics right.

different numbers: I hope you meant rates, not digits;)

pannous commented 7 years ago

actually I think all three options should be tested for. plus pitch once we have that. and other future 'axis': environment, bit rate, ...