tfjs-models speech-commands ability to personally tune each word.

hpssjellis commented 5 years ago

@tafsiri @caisq

I am using

<script src="https://cdn.jsdelivr.net/npm/@tensorflow/tfjs@1.1.2"> </script> 
<script src="https://cdn.jsdelivr.net/npm/@tensorflow-models/speech-commands@0.3.6">

The speech-commands work fine for "up", "down" and "right" but seem a bit off for "left". When I look at the data for 2 sets of "up". It looks like it is comparing 6 decimal values for each word with a hidden set of 6 decimal values.

personalize classifier

Any idea how to delete the stored set of values for "left" and replace them with a user personalized set of values. The Speech Commands API has some information for complete re-training, but nothing seems to be mentioned about minor re-tuning. I have done something similar when working with the tfjs-face-api for which 68 points were stored for each face, and these values could be changed.

For Speech-Commands demos refer to https://glitch.com/~tfjs-speech-commands made by @tafsiri

or my version at https://hpssjellis.github.io/tfjs-models-purejs-speech-commands/

P.S. Any ideas why "directional4w" is a fair bit better than "18w" ? The "18w" model seems to only work for me with "down" and "eight"

tafsiri commented 5 years ago

Good question. Would something like this work? (note I haven't actually tried this). cc @caisq would this be something we want to enable for the built in vocabs?

const baseRecognizer = speechCommands.create('BROWSER_FFT', 'directional4w');
await baseRecognizer.ensureModelLoaded();
const transferRecognizer = baseRecognizer.createTransfer('directional4w');

await transferRecognizer.collectExample('left');
... more stuff here, probably eventually saving the result ...

hpssjellis commented 5 years ago

@tafsiri

await transferRecognizer.collectExample('left');

I will play around with that. I think the following line should be able to give me feedback about if I am making any changes.

console.log(transferRecognizer.countExamples());

Although I think this is all pre-training data, not sure how it relates to the final output.

... a few hours later

...

So I made a webpage that sort-of tests this idea.

const transferRecognizer = baseRecognizer.createTransfer('directional4w');

Does load the correct model and allows

await transferRecognizer.collectExample('left');

Proved by

console.log(transferRecognizer.countExamples());

That model then works for the 4 directions, but without some kind of training I think these changes have zero effect on the pre-trained model. I am wanting to tweak the actual trained model. Any other suggestions.

Looking at the source I have made this list of methods, but that does not seem to be very useful as I can't get many of them working:

start() onAudioFrame() stop() setConfig() getFeatures() tick() suppress() addExample() merge() getExampleCounts() getExamples() getData() augmentByMixingNoise() getSortedUniqueNumFrames() removeExample() setExampleKeyFrameIndex() size() durationMillis() empty() clear() getVocabulary() serialize() listen() ensureModelLoaded() ensureModelWithEmbeddingOutputCreated() warmUpModel() ensureMetadataLoaded() stopListening() isListening() wordLabels() params() modelInputShape() recognize() recognizeOnline() createTransfer() freezeModel() checkInputTensorShape()

tafsiri commented 5 years ago

If await transferRecognizer.collectExample('left'); works then you should be able to then call something like

await transferRecognizer.train({
  epochs: 25,
  callback: {
    onEpochEnd: async (epoch, logs) => {
      console.log(`Epoch ${epoch}: loss=${logs.loss}, accuracy=${logs.acc}`);
    }
  }
});

To do the additional training. (I got this snippet from the this part of the readme)

hpssjellis commented 5 years ago

Kind of working

Cannot train transfer-learning model 'directional4w' because only 1 word label ('["left"]') has been collected for transfer learning. Requires at least 2.

Will update as I figure it out.

caisq commented 5 years ago

@hpssjellis In case you haven't seen it, the demo code here might be helpful to the work you are doing. It includes a UI that supports data collection for transfer learning. https://github.com/tensorflow/tfjs-models/tree/master/speech-commands/demo

caisq commented 5 years ago

@hpssjellis The demo is hosted at: https://storage.googleapis.com/tfjs-speech-model-test/2019-01-03a/dist/index.html

hpssjellis commented 5 years ago

As far as I can tell, this approach is doing exactly what I don't want!

If I take "directional4w" and train it on "left" and "right" it learns "left" and "right" but then forgets "up" and "down". I wanted to just tweak the results for "left" but keep everything else the same for the other 3 speech commands, not train everything from scratch. The problem will be even more evident with "18w" If I use it and it has 1 or 2 bad commands, then I have to train the entire model from scratch.

Any other ideas? Is a clear model method being run automatically?

@caisq The above demo has all the issues every beginner has. It has so much confusing CSS and UI clutter that we have a hard time understanding the basic Machine Learning abilities.

It is a great demo, but a crummy example.

hpssjellis commented 5 years ago

Does the model.compile method clear the previous model?

this.model.compile({ loss: "categoricalCrossentropy", optimizer: e.optimizer || "sgd", metrics: ["acc"] })

I know with my own models that if I first compile the model and then train it 20 epochs and then train it again for another 20 epochs. It is a combined train of 40 epochs, but if I model.compile before the second 20 epochs then I lose the first 20 epochs.

Is that possibly happening with the SpeechCommands. Since you have to define model.compile then the previous training is deleted and it starts fresh. Is there anyway to define model.compile without it deleting the data from the old training?

tafsiri commented 5 years ago

Ah yes. Now I have a clearer idea of why this approach won't work, and its more fundamental to how neural networks are trained.

The model itself does not retain the actual training examples, instead the knowledge of how to classify audio samples is encoded in all the weights of the model. Also, during transfer learning all the weights are adjusted to perform better on the new examples (there isn't a way to isolate the ones for just 'left'). Since the model is only getting new examples for something like 'left' in this case, it progressively forgets how to do the other classes since there is no feedback in the new training data with regard to those other classes.

So fundamentally the best approach is to augment the original training set and train again from scratch. Our training scripts are here, but are unfortunately incomplete and a bit cumbersome, and we haven't had any time to clean them up. We do want to improve those when we get the bandwidth (and also work on improving the models themselves with updated training data), but I can't give you an estimate of when that would happen.

However one thing you might be able to do is collect new samples for all the classes (including background noise) and train it a little bit (by a little bit I mean for only a few steps, maybe one or two epochs) and possibly passing in an optimizer with a low learning rate. This might nudge the model towards your new training data without forgetting as much of the original training data. This is quite hacky though, so may not be worth the time.

Ultimately I think we need to look into improving the model and making the training scripts more accessible so others can train new models (and improve the ux of the example), but unfortunately I don't think there is a great existing solution to your problem today. This issue has provided some good feedback and actionable items for us though so we appreciate the time to report the issue.

hpssjellis commented 5 years ago

Thanks @tafsiri for the detailed explanation.

However one thing you might be able to do is collect new samples for all the classes (including background noise) and train it a little bit (by a little bit I mean for only a few steps, maybe one or two epochs) and possibly passing in an optimizer with a low learning rate. This might nudge the model towards your new training data without forgetting as much of the original training data. This is quite hacky though, so may not be worth the time.

I love putting crazy amounts of time into "hacky" things, see video below. Speech Commands makes so much more sense now. I was incorrect thinking it was storing KNN classifier type data that could be edited.

By the way, thanks to the entire tensorflowjs team, I really appreciate all they have done. Tensorflowjs really brings machine learning to the public. The public just doesn't really know much about tensorflowjs yet.

Here is my latest video showing Posenet and Speech Commands with a web-socket toy car. https://youtu.be/61jYk4a8wkE

Feel free to close this issue.

tensorflow / tfjs

tfjs-models speech-commands ability to personally tune each word. #1597