markovmodel / variational

Basis sets, estimators and solvers for the variational approach of conformation dynamics. NOTE: the code has been merged with PyEMMA and is maintained there.
10 stars 5 forks source link

basis set computation doesn't seem to work right #8

Open franknoe opened 9 years ago

franknoe commented 9 years ago

This code:

import variational
from variational.basissets.ramachandran import RamachandranBasis
alabasis = RamachandranBasis('A', radians=False)
import numpy as np
atraj = np.array([[-120, 60],[120, 120]])
alabasis.map(atraj)

leads to this output:

array([[ 1.        ,  0.2007158 , -0.79413052],
       [-0.        ,  0.        , -0.        ]])

which can't be right. The last row shouldn't be zero. At least the first column must always be 1.0

fvitalini commented 9 years ago

Hi Frank,

no it does make sense. The basis functions of the capped amino acids are evaluated on a 36X36 grid, but not all of the microstates are actually populated.

ac_a_nhme_rev_0

The micro state corresponding to the phi/psi combination [120,120] it is simply never visited.

What I have tested where the functions within ramachandran.py. I checked that by providing an np-array containing the phi/psi time series of all residues, the function would construct a matrix containing the the trajectory projected onto the basis functions and that it would produce the same results as my old code.

Francesca

fnueske commented 9 years ago

This is a non-trivial point isn't it? The single amino-acid eigenvectors are undefined for unpopulated states, but in principle, these states might show up in simulations of more complicated systems. Francesca, have you encountered this before? We can at least modify the first eigenvector to be equal to one everywhere.

Am 30.07.15 um 16:27 schrieb fvitalini:

Hi Frank,

no it does make sense. The basis functions of the capped amino acids are evaluated on a 36X36 grid, but not all of the microstates are actually populated.

ac_a_nhme_rev_0 https://cloud.githubusercontent.com/assets/13469315/8985400/7e792fd4-36d7-11e5-99e5-8ab21dd7fb85.jpg

The micro state corresponding to the phi/psi combination [120,120] it is simply never visited.

What I have tested where the functions within ramachandran.py. I checked that by providing an np-array containing the phi/psi time series of all residues, the function would construct a matrix containing the the trajectory projected onto the basis functions and that it would produce the same results as my old code.

Francesca

— Reply to this email directly or view it on GitHub https://github.com/markovmodel/variational/issues/8#issuecomment-126348057.

franknoe commented 9 years ago

I agree

Am 30/07/15 um 16:42 schrieb Feliks Nüske:

This is a non-trivial point isn't it? The single amino-acid eigenvectors are undefined for unpopulated states, but in principle, these states might show up in simulations of more complicated systems. Francesca, have you encountered this before? We can at least modify the first eigenvector to be equal to one everywhere.

Am 30.07.15 um 16:27 schrieb fvitalini:

Hi Frank,

no it does make sense. The basis functions of the capped amino acids are evaluated on a 36X36 grid, but not all of the microstates are actually populated.

ac_a_nhme_rev_0

https://cloud.githubusercontent.com/assets/13469315/8985400/7e792fd4-36d7-11e5-99e5-8ab21dd7fb85.jpg

The micro state corresponding to the phi/psi combination [120,120] it is simply never visited.

What I have tested where the functions within ramachandran.py. I checked that by providing an np-array containing the phi/psi time series of all residues, the function would construct a matrix containing the the trajectory projected onto the basis functions and that it would produce the same results as my old code.

Francesca

— Reply to this email directly or view it on GitHub

https://github.com/markovmodel/variational/issues/8#issuecomment-126348057.

— Reply to this email directly or view it on GitHub https://github.com/markovmodel/variational/issues/8#issuecomment-126351945.


Prof. Dr. Frank Noe Head of Computational Molecular Biology group Freie Universitaet Berlin

Phone: (+49) (0)30 838 75354 Web: research.franknoe.de

Mail: Arnimallee 6, 14195 Berlin, Germany

franknoe commented 9 years ago

Please still provide an example (a trajectory chunk) and demonstrate the use of

Each of those exclusively using code from variational, and each should just be a few lines of code

I guess that's to both Francesca and Feliks

Am 30/07/15 um 16:27 schrieb fvitalini:

Hi Frank,

no it does make sense. The basis functions of the capped amino acids are evaluated on a 36X36 grid, but not all of the microstates are actually populated.

ac_a_nhme_rev_0 https://cloud.githubusercontent.com/assets/13469315/8985400/7e792fd4-36d7-11e5-99e5-8ab21dd7fb85.jpg

The micro state corresponding to the phi/psi combination [120,120] it is simply never visited.

What I have tested where the functions within ramachandran.py. I checked that by providing an np-array containing the phi/psi time series of all residues, the function would construct a matrix containing the the trajectory projected onto the basis functions and that it would produce the same results as my old code.

Francesca

— Reply to this email directly or view it on GitHub https://github.com/markovmodel/variational/issues/8#issuecomment-126348057.


Prof. Dr. Frank Noe Head of Computational Molecular Biology group Freie Universitaet Berlin

Phone: (+49) (0)30 838 75354 Web: research.franknoe.de

Mail: Arnimallee 6, 14195 Berlin, Germany

fnueske commented 9 years ago

Ok, but today I don't have the time. I'll try tomorrow, ok?

Am 30.07.15 um 16:53 schrieb Frank Noe:

I agree

Am 30/07/15 um 16:42 schrieb Feliks Nüske:

This is a non-trivial point isn't it? The single amino-acid eigenvectors are undefined for unpopulated states, but in principle, these states might show up in simulations of more complicated systems. Francesca, have you encountered this before? We can at least modify the first eigenvector to be equal to one everywhere.

Am 30.07.15 um 16:27 schrieb fvitalini:

Hi Frank,

no it does make sense. The basis functions of the capped amino acids are evaluated on a 36X36 grid, but not all of the microstates are actually populated.

ac_a_nhme_rev_0

https://cloud.githubusercontent.com/assets/13469315/8985400/7e792fd4-36d7-11e5-99e5-8ab21dd7fb85.jpg

The micro state corresponding to the phi/psi combination [120,120] it is simply never visited.

What I have tested where the functions within ramachandran.py. I checked that by providing an np-array containing the phi/psi time series of all residues, the function would construct a matrix containing the the trajectory projected onto the basis functions and that it would produce the same results as my old code.

Francesca

— Reply to this email directly or view it on GitHub

https://github.com/markovmodel/variational/issues/8#issuecomment-126348057.

— Reply to this email directly or view it on GitHub

https://github.com/markovmodel/variational/issues/8#issuecomment-126351945.


Prof. Dr. Frank Noe Head of Computational Molecular Biology group Freie Universitaet Berlin

Phone: (+49) (0)30 838 75354 Web: research.franknoe.de

Mail: Arnimallee 6, 14195 Berlin, Germany

— Reply to this email directly or view it on GitHub https://github.com/markovmodel/variational/issues/8#issuecomment-126358055.

fvitalini commented 9 years ago

Hi,

The microstates where the first eigenvector is zero are states that are not part of the largest connected set in the MSM of the amino acid. Theoretically it is true that the same amino acid in a different sequence might have a “slightly” different distribution. However, the hypothesis at the basis of such basis set definition is that the differences in the dynamics of X between Ac-X-NHMe and Y-X-Z should be irrelevant. The basis functions I have used for the paper have zeros for those microstates that are not visited by the trajectory.

I have encountered already a case where there was an obvious difference between the capped amino acid and the amino acid in the sequence. For example, Alanine’s distribution in Ac-AP-NHMe is very different from Ac-A-NHMe. We ended up defining a new basis function in that case. I haven’t checked if any of the other amino acids populates states that are not populated in the corresponding residue-based functions, but this has not been an issue for me so far.

I will provide an example on how to use the functions "Single Ramachandran Basis” and "Product Basis”. Is it ok if I add a folder, e.g. EXAMPLE, and inside provide scripts and files to try the functions?

Francesca

Il giorno 30/lug/2015, alle ore 16:42, Feliks Nüske notifications@github.com ha scritto:

This is a non-trivial point isn't it? The single amino-acid eigenvectors are undefined for unpopulated states, but in principle, these states might show up in simulations of more complicated systems. Francesca, have you encountered this before? We can at least modify the first eigenvector to be equal to one everywhere.

Am 30.07.15 um 16:27 schrieb fvitalini:

Hi Frank,

no it does make sense. The basis functions of the capped amino acids are evaluated on a 36X36 grid, but not all of the microstates are actually populated.

ac_a_nhme_rev_0 https://cloud.githubusercontent.com/assets/13469315/8985400/7e792fd4-36d7-11e5-99e5-8ab21dd7fb85.jpg

The micro state corresponding to the phi/psi combination [120,120] it is simply never visited.

What I have tested where the functions within ramachandran.py. I checked that by providing an np-array containing the phi/psi time series of all residues, the function would construct a matrix containing the the trajectory projected onto the basis functions and that it would produce the same results as my old code.

Francesca

— Reply to this email directly or view it on GitHub https://github.com/markovmodel/variational/issues/8#issuecomment-126348057.

— Reply to this email directly or view it on GitHub.

franknoe commented 9 years ago

sure

Am 30/07/15 um 17:01 schrieb Feliks Nüske:

Ok, but today I don't have the time. I'll try tomorrow, ok?

Am 30.07.15 um 16:53 schrieb Frank Noe:

I agree

Am 30/07/15 um 16:42 schrieb Feliks Nüske:

This is a non-trivial point isn't it? The single amino-acid eigenvectors are undefined for unpopulated states, but in principle, these states might show up in simulations of more complicated systems. Francesca, have you encountered this before? We can at least modify the first eigenvector to be equal to one everywhere.

Am 30.07.15 um 16:27 schrieb fvitalini:

Hi Frank,

no it does make sense. The basis functions of the capped amino acids are evaluated on a 36X36 grid, but not all of the microstates are actually populated.

ac_a_nhme_rev_0

https://cloud.githubusercontent.com/assets/13469315/8985400/7e792fd4-36d7-11e5-99e5-8ab21dd7fb85.jpg

The micro state corresponding to the phi/psi combination [120,120] it is simply never visited.

What I have tested where the functions within ramachandran.py. I checked that by providing an np-array containing the phi/psi time series of all residues, the function would construct a matrix containing the the trajectory projected onto the basis functions and that it would produce the same results as my old code.

Francesca

— Reply to this email directly or view it on GitHub

https://github.com/markovmodel/variational/issues/8#issuecomment-126348057.

— Reply to this email directly or view it on GitHub

https://github.com/markovmodel/variational/issues/8#issuecomment-126351945.


Prof. Dr. Frank Noe Head of Computational Molecular Biology group Freie Universitaet Berlin

Phone: (+49) (0)30 838 75354 Web: research.franknoe.de

Mail: Arnimallee 6, 14195 Berlin, Germany

— Reply to this email directly or view it on GitHub

https://github.com/markovmodel/variational/issues/8#issuecomment-126358055.

— Reply to this email directly or view it on GitHub https://github.com/markovmodel/variational/issues/8#issuecomment-126360408.


Prof. Dr. Frank Noe Head of Computational Molecular Biology group Freie Universitaet Berlin

Phone: (+49) (0)30 838 75354 Web: research.franknoe.de

Mail: Arnimallee 6, 14195 Berlin, Germany

franknoe commented 9 years ago

Am 30/07/15 um 17:01 schrieb fvitalini:

Hi,

The microstates where the first eigenvector is zero are states that are not part of the largest connected set in the MSM of the amino acid. Theoretically it is true that the same amino acid in a different sequence might have a “slightly” different distribution. However, the hypothesis at the basis of such basis set definition is that the differences in the dynamics of X between Ac-X-NHMe and Y-X-Z should be irrelevant. If you encounter a new system that visits points that have not been visited in your parametrization, one still needs to do something reasonable with them. At the least the first column must be 1, otherwise subsequent algorithms such as Feliks' one will simply break down. But also for the other columns I think we have to do some reasonable interpolation.

I'm sure that in large peptides or proteins you will not only have slight differences, but you can lock amino acids in phi/psi values that are practically forbidden for separate amino acids. So this is an issue.

The basis functions I have used for the paper have zeros for those microstates that are not visited by the trajectory.

I have encountered already a case where there was an obvious difference between the capped amino acid and the amino acid in the sequence. For example, Alanine’s distribution in Ac-AP-NHMe is very different from Ac-A-NHMe. We ended up defining a new basis function in that case. I haven’t checked if any of the other amino acids populates states that are not populated in the corresponding residue-based functions, but this has not been an issue for me so far.

I will provide an example on how to use the functions "Single Ramachandran Basis” and "Product Basis”. Is it ok if I add a folder, e.g. EXAMPLE, and inside provide scripts and files to try the functions? OK, add such a folder examples at the top level of the repository. If you add data, again make sure to use binary data, and ideally compressed.

Francesca

Il giorno 30/lug/2015, alle ore 16:42, Feliks Nüske notifications@github.com ha scritto:

This is a non-trivial point isn't it? The single amino-acid eigenvectors are undefined for unpopulated states, but in principle, these states might show up in simulations of more complicated systems. Francesca, have you encountered this before? We can at least modify the first eigenvector to be equal to one everywhere.

Am 30.07.15 um 16:27 schrieb fvitalini:

Hi Frank,

no it does make sense. The basis functions of the capped amino acids are evaluated on a 36X36 grid, but not all of the microstates are actually populated.

ac_a_nhme_rev_0

https://cloud.githubusercontent.com/assets/13469315/8985400/7e792fd4-36d7-11e5-99e5-8ab21dd7fb85.jpg

The micro state corresponding to the phi/psi combination [120,120] it is simply never visited.

What I have tested where the functions within ramachandran.py. I checked that by providing an np-array containing the phi/psi time series of all residues, the function would construct a matrix containing the the trajectory projected onto the basis functions and that it would produce the same results as my old code.

Francesca

— Reply to this email directly or view it on GitHub

https://github.com/markovmodel/variational/issues/8#issuecomment-126348057.

— Reply to this email directly or view it on GitHub.

— Reply to this email directly or view it on GitHub https://github.com/markovmodel/variational/issues/8#issuecomment-126360486.


Prof. Dr. Frank Noe Head of Computational Molecular Biology group Freie Universitaet Berlin

Phone: (+49) (0)30 838 75354 Web: research.franknoe.de

Mail: Arnimallee 6, 14195 Berlin, Germany