Implement optimized python version

I have done minimal tests, but the train is still running.

The following config takes around 600 Mb of RAM when training. cfg = { 'sequenceSize': 512, 'dimension': 512, 'arrayDimension': 8, 'predictSteps': 8, 'batchSize': 4096 }

31/31 [==============================] - 174s 6s/step - loss: 428.4231 - accuracy: 0.1263 Epoch 2/1000 16/31 [==============>...............] - ETA: 56s - loss: 49.8825 - accuracy: 0.1587