Closed ghost closed 4 years ago
The point about reducing dimensionality in the context of MSM estimation is that you want to map down to a dimensionality that you can discretize without too much discretization errors. The higher the dimensionality, the more difficult to discretize. It depends a lot on the dataset what number of dimensions is feasible. It is certainly > 2, but I'd also stay below, say, 100. You can have a look at the marginal distributions using pyemma.plots.plot_feature_histograms()
, often higher TICA dimensions become a bit noisy. But the question on how many dimensions to keep is not at all trivial, you need to make sure you are not discarding important / interesting processes which requires some sort of understanding what the TICs mean in terms of your data. Maybe our tutorials are also interesting for you (in notebook 2 we explain dimension reduction).
thank you
Hello,
I am trying to build an MSM for a 17-residue RNA system, using eRMSD and G-vectors as my features. Upon loading features, I get 1157 dimensions and when I transform my data using TICA, I get 116 dimensions. I have considered moving on with only the first two ICs, but I am worried because they do not account for a lot of the cumulative variance.
Is it practical to move on with all the tica dimensions I got? If no, what can I do to further reduce dimensionality?
Thank you,
Tia