Closed complyue closed 1 year ago
I'm unsure what you mean by 'soft labels', but HMMs are usually trained using the Baum-Welch algorithm, which can be thought of as a temporal/structured version of EM. 'baum-welch' is the default and does not require labels at all. Can you define a soft label in this context exactly?
'soft labels' here I mean, for each given temporal point, provide the label as a vector of probabilities for each states, instead of a definite state index, which I sort of think should be called 'hard label'.
This is called Partial HMM. I am interested in such a model too. See my question here: https://github.com/hmmlearn/hmmlearn/issues/173 The answer was pointing to this paper on partially observed HMMs.
I think it would be a nice feature to add. It doesn't seem like it's that difficult to add either-- simply provide a matrix W which is size observations x states where you input the prior probability of an observation belonging to that state, and multiply by that in the FB algorithm. Is there a more intuitive way to specify the prior weights than a giant matrix though?
Thank you for opening an issue. pomegranate has recently been rewritten from the ground up to use PyTorch instead of Cython (v1.0.0), and so all issues are being closed as they are likely out of date. Please re-open or start a new issue if a related issue is still present in the new codebase.
What's in my mind is to replace forward/backward pass with the same algorithm, but using different observation values (may be called futuristic values, since they are yet to be observed at their respective time positions), with predefined emission distributions, thus trained HMM will be used for sort of predictions.
If I understand it correctly, pomegranate implemented
hmm.sumarize(algorithm='labeled')
with part of the aim to supporthmm.sumarize(algorithm='viterbi')
, however I didn't see viterbi training being a common supported feature in other HMM packages, even less seemingly for labeled training, so I think pomegranate just advanced in this direction.However current implementation of viterbi/labeled training assumes hard labels, I think soft labels is essential to achieve my stated goals, if their won't be a shadow training algorithm that take parallel observation data together with another HMM sharing all states with the HMM being trained, but with different (maybe frozen) emission distributions.
I'd like to poll your thoughts and experiences (which will be greatest), regarding this idea.