jemisjoky / TorchMPS

PyTorch toolbox for matrix product state models
MIT License
138 stars 31 forks source link

I modified train_script to adaptive mode, but the effect is not better than the original. What is the advantage of it? #6

Closed Icespellcaster closed 5 years ago

Icespellcaster commented 5 years ago

I modified train_script to adaptive mode, but the effect is not better than the original. What is the advantage of it?

adaptive_mode = True

This is all I modified.

Icespellcaster commented 5 years ago

Whether adaptive_mode = True or False, the result is the same. train_accuracy = 97% and test_accuracy = 87%. However, convolutional neural networks can easily tune training accuracy to 100% and test accuracy to 97%. I think the effect of mps and linear neural network is similar. In the network structure, the two seem to be not much different, except for the svd method of mps. In the paper Stoudenmire and Schwab 2016, the author used sweeping algorithm. This algorithm's gradient update is unique and interesting. So why do you give up this algorithm and turn to ordinary gradient update?

jemisjoky commented 5 years ago

Hi @Icespellcaster, good observation! Something to keep in mind is that MPS models have a "one-dimensional" inductive bias, and are much closer in layout to RNN's than CNN's. Because of this, one shouldn't expect the model to perform as well on image classification problems, the same way it would be silly to use an LSTM for computer vision tasks.

I should also mention that the training script included in the code is not intended to achieve competitive performance on MNIST, it's just a short example to show users how they can use our code for their own problem (the default parameters only uses 3% of the MNIST data for training!). Tweaking these parameters and using a scheduled learning rate can achieve 98% test accuracy, although again achieving competitive performance on MNIST is not the point of our model.

With regards to the adaptive mode, the goal here isn't necessarily to improve performance, but instead to learn variable bond dimensions that give us some information about the complexity of our problem within different input regions. You can see a very nice illustration of bond dimension adaptivity in this excellent paper (Figure 4), although their training algorithm is different than ours.

jemisjoky commented 5 years ago

I closed the issue, but please let me know if you have any other questions about any of this :)