Issues with result reproduction

Jakub-11 commented 1 year ago

Hi, I must admit I had surprisingly severe problems with the reproduction of the results presented in the notebooks. To describe possible differences, I had to install the conda environment step by step (one-liner from readme didn't work due to conda hanging on matplotlib, I believe, but the environment in the end should be the same), and naturally I had to modify the paths to the synaptotagmin files, but otherwise I didn't modify the notebooks in any way. I am running them remotely on a machine with a gpu, but for toymodel2 and 10cube I also run the notebooks with a cpu.

Toymodel_2Systems Naturally this is the notebook I had run many times, and the only one that was able to reproduce the results correctly in my case. However, it seemed very unreliable, is that expected? I think the success rate was smaller than 50%, and in the other cases model did not converge or, worse, it converged to the wrong solution, for example: and it seems that attention layer can easily get confused and assign features to wrong (with incorrect number of states) vampnets, what may be an issue for example if we wanted to analyze a protein with four different domains, each with different number of states.
10Cube Here I had no success - no run produced satisfying reproduction of the results from the paper. Example of a mask after training: and the timescales: Admittedly it happens to be one of the worst results I got, but even the best ones were still far off the desired output. I once extended the final part of training 10 times, and that improved the results, but they were still not satisfactory.
Synaptotagmin I run the this notebook twice - first time I got divergent results very similar to the plot of standard vampnet run on synaptotagmin from the supplementary figures of your paper. I run it another time, and the results got better: but it's still a bit far from the "ideal" plot in the paper.

I would like to believe I had made some mistake, but at this point I am not able to identify it myself, so I would be grateful for any help and clarification.

To sum up, 10Cube seems broken to me, toymodel works only sometimes, and synaptotagmin also seems unreliable, although I could believe that getting perfect result requires some luck, but all of that raises my concerns about applying iVAMPnet to some other protein, especially if the dynamical independence of its domains is just a hypothesis / the dynamics of its domains is less independent.

Best regards, Jakub

amardt commented 1 year ago

Hi Jakub,

thanks for reaching out. I am sorry for your frustration which I can understand. Let me admit that the method is for sure not perfect yet. But it is also not as bad as your results might indicate at least on my end.

But firstly let me make some points clear. In the case of the original VAMPnet, the method only achieved a success rate of 40% when it was released (see SI figure), and only due to recent advances in model architecture the success rate increased to nearly 100% for toymodels. Remark, coming to protein systems, they usually have eigenfunctions very close to each other. Sometimes they have a few which have significant larger eigenvalues, but at some point the eigenvalues are lying very closely to each other. Thereby, the model will usually pick up different eigenfunctions form this set for different runs. This examplifies a problem of getting stuck in suboptimal solutions when using the VAMP score. If the model is initialized and it already approximates a non-optimal eigenfunction quite well. The model will rather optimize to approximate that eigenfunction better than starting to search for the other one, because it will directly increase the VAMP score, instead of first decreasing it and then increasing it again. I hope that makes the overall problem clear. It means, the score is theoretical correct, but the optimization of the NN only yields a local minimum but not in general the global one. In practice this means, that we have to train several networks from scratch and afterwards determine which is the best based on the VAMP score. Now to iVAMPnets, this problem only intensifies here, since we are managing several VAMPnets at the same time which are interdependent. The model might not be able to converge to the global optimal solution given an unfavorable initialization. But the important part is that we are able based on the scores to determine which is actually the best model, after trained several.

After this theoretical comment, I would love to be of some help in your specific cases. So what you can try to do is:

Turn on the training of the mask only after some initial training epochs.
Check the following scores additionally to the training score after training: train_pen_C01, train_pen_scores, and train_pen_C00
Play around with the value of lam_decomp
One could in principle also include the train_pen_C01 score in the training itself as a test, but therefore you would need to include it in the ivampnets file.

What I am especially confused are your reports to the example 10cube, which works very consistent on my end, but you report basically complete failure. I am not sure what the reasons are there.

Again I am sorry that you feel frustrated, but I hope we can sort things out. We are hoping that as in the case of VAMPnets new developments in the optimization techiques will help to avoid getting stuck in non-optimal solutions.

Further, new theoretical ideas how to enforce the independence might emerge, e.g. training it adversarily with a second network which tries to predict given the state of all-but-one network the left-out-one.

Thanks for all your patience and best greetings! Andreas

Jakub-11 commented 1 year ago

Hi,

I am sorry for leaving this issue open for so long, I admit I forgot about it. Indeed after reduction of the amount of noise in the 10cube example it started to work as intended, and I see that it was also applied in one of your commits, so I assume it must be working smoothly now.

Thank you for an exhaustive explanation.

Best regards, Jakub

markovmodel / ivampnets

Issues with result reproduction #3