xor training example - Githubissues

shiffman commented 6 years ago

This example works!

https://shiffman.github.io/Tensorflow-JS-Examples/01_XOR/

screen shot 2018-05-03 at 6 04 05 pm

I am planning to recreate my XOR Coding Challenge in a video tutorial with the layers API. @nsthorat @dsmilkov @cvalenzuela @yining1023 feel free to take a look at the code in case you have any feedback!

This example at the moment is written in a slightly strange way as I'm experimenting with ideas for wrapping the layers API into new classes for ml5: NeuralNetwork (likely not the same I will use) and Batch. When I make a tutorial I will just walk through the steps without breaking things out into separate classes. And then eventually I'll make an even simpler ml5 tutorial?

The things I'm not super sure about:

use of tidy() vs. individual dispose()
how to run training effectively in a loop while also having an animation loop i.e. p5's draw().
learning rate considerations
epochs per training cycle considerations
using dataSync() vs data() (in conjunction with an animation loop).

This is obviously a totally trivial example -- The XOR problem is one that helps me think through and learn how this stuff works. . so hopefully it's useful?

cvalenzuela commented 6 years ago

nice! looks good!

I think tidy() manages memory automatically so you don't have to dispose() manually.
For the animation loop what ml5 does, is it sets an internal boolean this.isPredicting variable that becames true when the function is called and changes to false once it has finish (see this). It only runs the predicting function when this.isPredicting = false. Something similar could be used for training.

We should add this as one of the tutorials in the ml5 website too

nsthorat commented 6 years ago

This looks great!

You can make predict a little shorter:

predict(inputs) {
  return tf.tidy(() => {
   const data = inputs instanceof Batch ?  inputs.data : [inputs];
    const xs = tf.tensor2d(data);
    return this.model.predict(xs).dataSync();
  }
}

So in general, with the structure of your program, you used tidy() and dispose() properly. You can use them interchangeably depending on the scenario. In the case of your fit, you actually cannot use a tidy() because fit is async, so you have to manually dispose xs and ys. It's a long story why we turned off async tidy(), but just trust me they don't work :)

You can likely remove some of the memory management stuff if you create your dataset Tensors (xs and ys) when the NeuralNetwork object is created (as an argument to the constructor). That way, when you call fit, you don't need to create xs and ys every time, and then dispose them (this causes an upload to the GPU every tick, and causes you to have to dispose() them between each call to train()). This will let you repeatedly call train. If you want to iteratively call train, also make sure to set the epochs to 1 so it only trains your network for a single pass through the dataset every time you call NeuralNetwork.train.

Learning rate seems fine. In general, learning rate is proportional to the range of your dataset. It's really important to normalize dataset between 0=>1 or -1=>1, in fact much of the initializer mathematics relies on this so your model doesn't explode. In your case, your data is already normalized because it's learning a binary transformation.

Regarding epochs, this is really up to you. If you want to make an interactive application, I would set epochs to 1 so you can see new predictions between each pass through the whole dataset. This is really up to you, though.

Use .data() in general if you can. .dataSync() causes the UI thread to block on the GPU being done, which will make your page freeze up (since scrolling, layout, rendering, etc also happen on the same UI thread). .data() returns a promise that resolves when the GPU is finished doing its work and when the tensor downloads to the CPU, allowing the UI thread to render, scroll, do layout, etc while the computations are happening on the GPU.

The only complication with .data() is that it's async, so you can't simply wrap in a tidy. Predict would actually look like this:

async predict(inputs) {
  const y = tf.tidy(() => {
   const data = inputs instanceof Batch ?  inputs.data : [inputs];
    const xs = tf.tensor2d(data);
    return this.model.predict(xs);
  }
  // Downloading to the CPU with .data() will actually remove
  // the GPU data so you don't explicitly have to dispose it.
  return await y.data();
}

Hope this helps. Let me know if I can clarify anything.

nsthorat commented 6 years ago

One more thing I noticed, you don't need inputShape on layers 2+ (it can be computed for you!)

https://github.com/shiffman/Tensorflow-JS-Examples/blob/master/01_XOR/neuralnetwork.js#L24

You can also shorten your code a little (totally optional):

this.model = tf.sequential({
  layers: [
    tf.layers.dense({
      units: hidden,
      inputShape: [inputs],
      activation: 'sigmoid'
    }),
    tf.layers.dense({
      units: outputs,
      activation: 'sigmoid'
    });
  ]
});

shiffman / Tensorflow-JS-Examples

xor training example #1