generated simple dataset model history

karino2 commented 5 years ago

Original dataset seems too difficult (too small size for it's complexity). I generate far easier dataset that contains

58k one symbol dataset (alphabet and number only)
60k NUMBER NUMCHAR (eg. 3a, 2x, 15)
60k CHAR_NUMCHAR (eg. X_i, x_2)
60k CHAR^NUMCHAR (eg. X^i, y^2)
add small randomization for each symbol placement, simple data augmentation

from subtask symbol training dataset. Validation set is almost the same ratio by subtask validation set dataset.

https://github.com/karino2/tegashiki/blob/master/tegashiki_mathexp_generate.ipynb

karino2 commented 5 years ago

expgen_rnn_small_dropout05

Model is the same as padstroke_small_rnn_small_dropout05 in #1 . Feature extractor with GRU encoder-decoder-attention model.

acc 0.956

expgen_rnn_small_dropout05

Impressive score. It seems generating data from symbol data seems better strategy for current stage.

It's not clear why the score is better than single symbol prediction. But this dataset contains only alphabet (include common math symbol) and number, so it might be easier to distinguish than one symbol task.

karino2 commented 5 years ago

Above model is very nice, but it was hard to transform to TensorFlow Lite because RNN is only experimental stage (their dynamic_rnn function generate grap that is hard to train in TPU (dynamic shape even though I supply all shape)).

So I explorer CNN based encoder-decoder model instead.

karino2 commented 5 years ago

convdec

Feature extractor create list of stroke features. Decoder conv1d-ed teacher force input. Add attention for this output and stroke features. Add absolute position to decoder input embed.

conv1d filter size is 3, kernel size is 8.

acc: 0.2

convdec

Out of question...

karino2 commented 5 years ago

convf5k128

Above result experiment seems training loss itself is also not decrease enough. So just add more parameter in conv1d. kernel size 5. filter size 128.

acc: 0.237

convdec_f5k128

Still score is out of question. (though graph shape is much nicer...)

karino2 commented 5 years ago

convdec_f5k128_storkeposenc

Add absolute position to stroke feature too.

acc 0.275

convdec_f5k128_storkeposenc

Out of question.

karino2 commented 5 years ago

convencdec

Add conv1d to encoder side too.

acc: 0.233

convencdec

no improments.

karino2 commented 5 years ago

convencdec_posfc

Add embedding layer to abosolute pos before adding up.

acc 0.86

convencdec_posfc

Yes! getting better! I mis-read ConvS2S paper. We need embedding layer.

Score is lower than GRU based model. But this is worth trying score. Let's covert to TF Lite.

karino2 commented 5 years ago

convencdec_myembed2

Embedding layer with generated tensor (not input) cause TOCO converter to fail. (positional encoding needs this).

It seems tf.gather and dynamically generated tensor cause TOCO converter failure (?).

So I create my own embedding layer. Create one_hot vector and matmul to -1 to 1 uniform initialized weight matrix.

acc: 0.73

convencdec_myembed2

Getting score worse. But still working and can convert to TF Lite model.

karino2 commented 5 years ago

convencdec_myembed_smallinit

Keras Embedding layer seems weight matrix initialized by -0.05 to 0.05 uniform distribution. I setup the same initialization.

acc: 0.84

convencdec_myembed_smallinit

Now score becomes almost identical to keras Embedding layer (though regularization seems a little different).

Anyway, we have working model that can convert to TF Lite at last!

karino2 commented 5 years ago

convencdec_fixfutureleak

Previous model was bug that future input information is leaked via layer normalization. So just drop this layer and train.

acc: 0.75

convencdec_fixfutureleak

Getting worse, but better than random. So this is enough to check in real device.

karino2 / tegashiki