hatzakislab / DeepFRET-GUI

GNU General Public License v3.0
8 stars 5 forks source link

HMM fails to detect transitions #11

Closed simonbonano closed 4 years ago

simonbonano commented 4 years ago

The HMM fit function does not detect transitions in dynamic traces. I've attached two dynamic POR traces as well as an old math.py file with an hmm fit function that is able to detect transitions where the latest DeepFRET version fails.

Traces (txt): Trace_Process_001_20190510_tif_pair3.txt Trace_Process_006_20190510_tif_pair9.txt

Plotted traces fitted with "old" hmm function: Trace_Process_001_20190510_tif_pair3.pdf Trace_Process_006_20190510_tif_pair9.pdf

Hmm fit function from old math.py:

def fit_hmm(
    X: np.ndarray,
    fret: np.ndarray,
    lengths: List[int],
    covar_type: str,
    n_components: int,
):
    """
    Fits a Hidden Markov Model to traces. The traces are row-stacked, to provide
    a (t, c) matrix, where t is the total number of frames, and c is the
    channels
    """
    X = X - np.mean(X)
    X = X / np.std(X)

    hmm_model = hmmlearn.hmm.GaussianHMM(
        n_components=n_components,
        covariance_type="full",
        min_covar=100,
        init_params="stmc",  # auto init all params
        algorithm="viterbi",
    )
    hmm_model.fit(X, lengths)
    print("covariances are: ", hmm_model.covars_)

    states = hmm_model.predict(X, lengths)
    transmat = hmm_model.transmat_

    state_means, state_sigs = [], []
    for si in sorted(np.unique(states)):
        _, params = fit_gaussian_mixture(fret[states == si])
        for (m, s, _) in params:
            state_means.append(m)
            state_sigs.append(s)

    return states, transmat, state_means, state_sigs
eembees commented 4 years ago

I see that at least one of them, by eyeballing, is dubious in whether or not we see a transition. Can you see how many states the final HMM fit outputs in total? And, are you fitting these individually with an HMM, or is it part of a larger dataset?

simonbonano commented 4 years ago

Both traces display two states. In the previous DeepFRET versions a believe that each trace was fitted individually? I agree that the first trace might be dubious, but this is as good as the data gets and ideally we should be able to detect these kind of transitions.

simonbonano commented 4 years ago

I just simulated 10 clean, dynamic traces and still it fails to detect any transitions. Thus, the issue is not a matter of noise / unclear data.

Representative plot: trace_6.pdf

Traces: trace_0.txt trace_1.txt trace_2.txt trace_3.txt trace_4.txt trace_5.txt trace_6.txt trace_7.txt trace_8.txt trace_9.txt

eembees commented 4 years ago

I won't be able to look at it right now, but I'll try over the weekend. When you fit the simulated traces with the HMM, can you see in the terminal what the interface says? It should be giving some sort of trace from pomegranate and which states are being fit.

On Fri, Aug 7, 2020, 11:49 simonbonano notifications@github.com wrote:

I just simulated 10 clean, dynamic traces and still it fails to detect any transitions. Thus, the issue is not a matter of noise / unclear data.

Representative plot: trace_6.pdf https://github.com/hatzakislab/DeepFRET-GUI/files/5040623/trace_6.pdf

Traces: trace_0.txt https://github.com/hatzakislab/DeepFRET-GUI/files/5040625/trace_0.txt trace_1.txt https://github.com/hatzakislab/DeepFRET-GUI/files/5040626/trace_1.txt trace_2.txt https://github.com/hatzakislab/DeepFRET-GUI/files/5040627/trace_2.txt trace_3.txt https://github.com/hatzakislab/DeepFRET-GUI/files/5040628/trace_3.txt trace_4.txt https://github.com/hatzakislab/DeepFRET-GUI/files/5040629/trace_4.txt trace_5.txt https://github.com/hatzakislab/DeepFRET-GUI/files/5040630/trace_5.txt trace_6.txt https://github.com/hatzakislab/DeepFRET-GUI/files/5040631/trace_6.txt trace_7.txt https://github.com/hatzakislab/DeepFRET-GUI/files/5040632/trace_7.txt trace_8.txt https://github.com/hatzakislab/DeepFRET-GUI/files/5040633/trace_8.txt trace_9.txt https://github.com/hatzakislab/DeepFRET-GUI/files/5040634/trace_9.txt

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/hatzakislab/DeepFRET-GUI/issues/11#issuecomment-670433760, or unsubscribe https://github.com/notifications/unsubscribe-auth/AG6XZLWDZV2HVKNZ3NFRVA3R7PE3HANCNFSM4PXNHVRQ .

simonbonano commented 4 years ago

No worries. I have attached a screen dump of the terminal interface after running HMM on the selected traces.

Skærmbillede 2020-08-07 kl  13 48 48

eembees commented 4 years ago

@komodovaran I see that the fit_gaussian_mixture function calls all types of covariances, is there a reason for that? Can't we just have the covariance be full or tied? There are some assumptions here that we don't want to left unconsidered, although I'm not 100% sure what. https://github.com/hatzakislab/DeepFRET-GUI/blob/1358a5e2aa4ccbb68e98508daba6e15d8b0d0973/src/main/python/lib/math.py#L225

komodovaran commented 4 years ago

I think we left it that way "because why not".

Or maybe because "full" would sometimes result in a BIC best fit, but would find extremely narrow distributions.

But you may change it to full if it works and resolves our problems.

eembees commented 4 years ago

I think I found the culprit - stricter BICs push the number of params to a minimum. To fix this I will change the default behavior to get the BIC from the GaussianMixtureModel.bic() method. I'll also change the fit_gaussian_mixture to only use covariance type full, to make this step 4 times faster. I don't think we gain anything on having statistically independent covariances, especially since we just this step only to find the number of components.

eembees commented 4 years ago

Ok, this is fixed in c7bc250 where we've set the gvars.hmmBICStrictness to False as standard, with option to change/raise this. Closing.

simonbonano commented 4 years ago

Sorry, but the issue doesn't seem to be fixed yet. The HMM is still unable to detect transitions on clean, dynamic traces:

Traces: trace_0.txt trace_1.txt trace_2.txt trace_3.txt trace_4.txt trace_5.txt trace_6.txt trace_7.txt trace_8.txt trace_9.txt

Error message: Skærmbillede 2020-08-17 kl  10 48 37

Any help would be appreciated (not super urgent though).

eembees commented 4 years ago

I see that the trace says number of components 1, so this is not in the HMM, rather fitting the Gaussian Mixture fit. Did you:

  1. Pull the latest changes (commit code c7bc250 )? and
  2. Double check that the gvars.hmmBICStrictness is set to False ?
simonbonano commented 4 years ago

Yes, I did both of these things. The histogram window and Gaussian Mixture fit works fine (BIC = 5 states on these traces). It's only the HMM fit and Transition Density window that doesn't work. For some reason it says number of components 1. In earlier versions it used to work. I don't know if it's the change to pomegranate that caused the issue? Or the k-means clustering from sklearn (see error message in previous comment)?

The traces are clearly dynamic: Skærmbillede 2020-08-17 kl  13 46 13

eembees commented 4 years ago

The histogram window and Gaussian Mixture fit works fine (BIC = 5 states on these traces). It's only the HMM fit and Transition Density window that doesn't work.

If you look in the very top of the screenshot you posted (https://github.com/hatzakislab/DeepFRET-GUI/issues/11#issuecomment-674751158), you can see that the GMM model is returning that the HMM should do a 1-state fit. I assume it could be an issue with the DA being fit in a strange way, but I can't tell you for sure. On my machine it worked last night once I changed the parameters.

For some reason it says number of components 1. In earlier versions it used to work. I don't know if it's the change to pomegranate that caused the issue?

This is returned from the GMM fitting script, so it must be in that step (see https://github.com/hatzakislab/DeepFRET-GUI/issues/11#issuecomment-674502095). Pomegranate is just fitting the number of states that the GMM specifies.

Or the k-means clustering from sklearn (see error message in previous comment)?

The error message here comes from the fact that you only have one state, so there are no transitions between states. So when it looks for transitions it gets a matrix of length 0.

eembees commented 4 years ago

Feel free to try to change the code such that it works again, it could be a question of pulling the number of states out from an E-FRET fit before fitting DA/DD. Have you tried just fitting EFRET values?

simonbonano commented 4 years ago

Okay okay, I found the problem. It is related to the BIC strictness parameter which is set too high as default (setting hmmBICStrictness = False in config.ini apparently didn't overwrite it). I will play with optimum settings on both simulated and experimental data tomorrow and fix the problem. I didn't realise how much had actually changed in the HMM fit structure since the earlier versions DeepFRET that I'm used to. Closing 👍🏻