maxmouchet / HMMBase.jl

Hidden Markov Models for Julia.
MIT License
94 stars 12 forks source link

Multiple sequence with different length #16

Open sosuts opened 4 years ago

sosuts commented 4 years ago

Hi! I'm planning to use HMMBase.jl for biological usage. Specifically, I want to estimate a parameters (e.g. speed) of an object which has multiple states. My data have multiple coordinate data with different length.

I am new to julia so I'm not sure if HMMBase.jl or MS_HMMBase.jl supports this kind of analysis. If not, are there any future plans?

Thank you in advance! I'm sorry for a very silly question.

maxmouchet commented 4 years ago

Hi,

Parameter estimation with multiple sequences (of potentially different lengths), is not currently supported in HMMBase. It's something that I would like to implement (in mle.jl), but I don't know when.

I'm not sure if variable length sequences are supported by MS_HMMBase.

I'll keep this issue open as a reminder to try to implement this :-)

sosuts commented 4 years ago

Thank you for your reply :)

At least I can implement the algorithm with a specific distribution, but I don’t know how to make it work with an arbitrary distributions...

maxmouchet commented 4 years ago

In HMMBase I make use of the Distributions.jl package for the pdf (likelihood) and fit_mle (parameter estimation) methods. This way I can handle any observations distributions that implement these methods.
I only need to compute the "responsibilities" (assignments probabilities) for each observations and each states, and re-estimate the transition matrix.
See https://github.com/maxmouchet/HMMBase.jl/blob/master/src/mle.jl#L58-L68 where I delegate to fit_mle.

sosuts commented 4 years ago

Hi again, I'm now trying to integrate my code to this package.

BTW, I'm not sure what you mean by responsibilities of each observations.

maxmouchet commented 4 years ago

By "responsibilities" I mean P(Z_t = i | Y, θ), where Z_t is the hidden state at time t, Y the observations, and θ the model parameters.

sosuts commented 4 years ago

Thank you. I have another question. I was thinking that we scale β with c calculated from α. https://github.com/maxmouchet/HMMBase.jl/blob/f9928525d55e06321c8b22ffcb6c179c09fc52d1/src/messages.jl#L56-L67 Do I need to calculate another c in β?

maxmouchet commented 4 years ago

You're right, it's possible to use the same scaling vector c for α and β. It's not done, for now, to keep the code simple :-)

sosuts commented 4 years ago

Sorry for asking many questions! Why do we have to subtract m from loglikelihood?

https://github.com/maxmouchet/HMMBase.jl/blob/2ebf644152de2f2af6d23995601c6faf7b3bce8d/src/mle.jl#L39-L43

https://github.com/maxmouchet/HMMBase.jl/blob/2ebf644152de2f2af6d23995601c6faf7b3bce8d/src/messages.jl#L103-L123

maxmouchet commented 4 years ago

No worries!

This is the log-sum-exp trick : https://en.wikipedia.org/wiki/LogSumExp
It prevents exp(LL[t,j]) from overflowing.

sosuts commented 4 years ago

I restarted to write a code for multiple observations. I think I can send PR soon.

maxmouchet commented 4 years ago

Nice!

I recently cleaned-up the code by removing the logl keyword and the methods that do not use the log-likelihoods. Basically everything is done using the log-likelihoods now.

Feel free to open a PR, and I'll help you if there are merge conflicts.

sosuts commented 4 years ago

Hi. I changed my codes to adapt to your new api. I implemented some new functions assuming 2 situations;

  1. multiple observations with same length
  2. multiple observations with different(random) length

I haven't finished writing codes for multivariate model in situation 2. This notebook is an example.

Is it ok to open a pr? To be honest, this is my first time using github so I'm not sure when to open it...

maxmouchet commented 4 years ago

I think the correct URL is https://nbviewer.jupyter.org/github/SosUts/HMMBase.jl/blob/multiple_sequences/notebooks/multiple%20sequences.ipynb :)

This looks very nice! Thank you for your work :)

You can open a PR now, and I'll review the code.
It is still possible to push new commits to your branch after the PR is opened, so there is no problem to make further modifications.

sosuts commented 4 years ago

I opened it. Thank you as always!

Edit: I forgot to consider about the tests. Should I change the tests, or should I close the PR and change the codes?

maxmouchet commented 4 years ago

No worries, you can keep the PR open!
Every commit that you add to your branch will be added to the PR automatically.

I'm a bit busy this week, so I'll try to have a look at the PR this week-end, or the next one.
In any case your code looks clean :)

sosuts commented 4 years ago

Thanks to your original code! I'll try fixing things step by step.

cossio commented 1 year ago

Hello! I need this feature for some data I have, where also I have multiple time-series of different lengths.

What's the current status? I saw the linked PR got closed, but I'm not sure why?

Thanks!