Some methods are not clear. Lack of documentation?

I am trying to implement a similar module using tensor networks (MPS and binary tree) . Python is complicated for me and I have some questions regarding how the "adaptive training algorithm" (inspired by Stoudenmire and Schwab 2016) is implemented and work so well.

-How the module avoid the vanishing gradient problem explained here? Do you think is due to the initialization of the weight tensor with a a truncated identity matrix or the feature map you use. If you have some references explaining this will be perfect. I have tried with other feature maps and still happen. Also why do you initialize the weight tensor with a a truncated identity matrices?

Less important questions: -In every "optimizer.step()" what parameter or tensor are optimized(changed). If fully following Stoudenmire and Schwab 2016 one should only update the resulting tensor from contracting 2 near MPS tensors? It seems the module updates the full "InputRegions"(a tensor representing the MPS).

-Is there only one orthogonality center or many of them?

I have many other questions that are not clear even after a careful read of the code. I am opening this issue to ask if there is some more detail documentation on how this module is implemented or create one(If that make sense). Any help will be much appreciated.

Eduardo

jemisjoky / TorchMPS

Some methods are not clear. Lack of documentation? #23