Closed B4marc closed 6 years ago
Howdy
It's hard for me to give good feedback as to how you could improve your model when I know nothing about your data set.
I'm not sure where you got the values for your transition matrix but the probabilities in all the transitions leaving a state should sum to 1. pomegranate will auto-normalize these values if that's not the case, but you should be aware of the issue in the future. For example:
model_test.add_transition(s1, s1, 0.004)
model_test.add_transition(s1, s2, 0.66500000000000023)
the sum of the out-edges here is ~0.669 when it needs to be 1.0.
A reason it may not be "optimal" is that it is fitting well to the training set, but this may be overfitting and so inference may not generalize well to other data sets. This is where the concept of regularization comes in, either in the form of smoothing parameters, or in graphical models, totally eliminating edges. In the model you hand-wrote, you've eliminated the edges from s0 to s1 and from s1 to s0. This might help the model generalize better.
It can reduce accuracy when your labels don't correspond to clusters in the training set. You might consider projecting your data down into two dimensions and seeing if the labels correspond with clusters or if they are interspersed.
I really can't say without knowing more about the data. I think the best approach for you is to understand your data better and understand the appropriate distribution, rather than trying many things and hoping that something works out. If a naive Bayes classifier works well then perhaps neither a transition matrix nor a covariance matrix across features are needed to model your data well. An easy way to understand your data is to plot histograms of each feature. If you're seeing a bell curve then a uniform distribution is probably not appropriate. If your data is categorical than neither a normal nor a uniform distribution are appropriate.
I don't understand what you mean. If you're saying that the labels always go in the sequence "A A B A A B" then you can force this in your model structure. This would be a form of regularization that can give you improved performance.
Hi jmschrei, thank you very much for your quick and long answer! I am still on it, but I will not be able to work on my problem before friday.
Hi jmschrei,
I thought it might be better to see, if I can handle the problem in a different way. But I was not successful by now. Coming back to the questions:
1) It was actually my intention to overfit my model, because I thought in that way I can find mistakes -I made- more easily and know that I should be able to reach nearly 100% accuracy (independently of the data); as the Naïve Bayes Classifier
was appraoching. Which I haven’t been able to reach until now. I tested the model with different data as well and the accuracy stays at nearly 93% which is fine and which I am indicating as having a sufficient transition matrix. (I generated a transition matrix with lab
and got the same transition matrix, since the label-sequence is in each sample the same. Regarding the out-edges which are not summing up to 1: I trusted the auto-normalize
to handle this and was a bit lazy.)
2) This is a good hint, since the Naïve bayes Classifier
results were good, didn’t think about that. I going to look at it.
3) Concerning the data. Even though, LogNormalDistribution
would match better to the hists, the hmm with MultivariateGaussianDistribution
makes the best results instead of a IndependentComponentsDistribution(LogNormalDistributions)
. That is kind of strange to me.
4) Yes, that’s what I meant. How can I force this into the structure? Is there another way except the transition matrix or a transition matrix of a higher order?
However, my features dimensions were already a reduced form. If I use the unreduced feature-form with dimensionality nearly to 100, the model’s behaviour is getting very interesting.
5) Clustering the emissions of the Samples according to their labeled state and fitting a MultivariateGaussianDistribution
to each sample cluster returns 3 MultivariateGaussianDistribution
. I plotted the results by MultivariateGaussianDistribution[i].log_probability(clustered_samples)
and the spectrum (from the clustered_emissions). Each activity (shown in the spectrum) is perfectly recognized by the state-distribution (belonging to that activity) by returning the highest Log_probabiltiy
(in this case Log_probability
returns the Likelihood
491#, right?), during this activity.
But if I use this fitted MultivariateGaussianDistribution
in the hmm, the hmm is randomly jumping between mostly two states. How is this possible, if I haven’t changed the model except the MultivariateGaussianDistribution
, because of the changed features-dimension? The MultivariateGaussianDistribution
are perfectly describing the states properties - looking at the plots descripted above.
6) In my understanding, the only thing I need to change for the new dimensionality is the distributions which should generate stats accurate likelihoods for each Emission. Thus, the most likely state can be chosen. The transition matrix and structure of the model shouldn’t be nessesarry to change.
But using MultivariateGaussianDistribution
I get the best prediction with a different transition matrix. Can this be correct?
To my 5th question: Could this be the reason ?
For continuous distributions like the normal, the forward algorithm calculates the joint probability of each state j with the history of evidence over the support of the entire normal, and then evaluates the joint distribution at the point i. Since these are probability density functions, evaluating them at any single point can turn out "probabilities" above 1.
High values in the forward matrix either mean that that observation is really likely given the model or that the normal is constrained due to constraints on the data (which applies here). So the surprisingly high log probs don't necessarily indicate a good model--just a constrained space of observations.
For two months now, I am testing hmm on my data. But I am running out of ideas. Thus, I hope one of you can help me.
My continuous data have 4 dimensions and I have 4 samples of 500 observations and labels between 0-2 (so 3 in total -> 3 states). I defined my states with 3 MultivariateGaussianDistribution, even though it is not the best Distribution for these data, as I figured out (by testing and visualising) #190. . Although
141 I tried out the „from_sample“ method. First, as unsuperviced learning, which – according to other issues- results -in most of the cases- in better fitting than superviced learning. After that I applied a superviced learning and then builded up the structure manually, as you can see in the following code:
Don’t misunderstand me, I am totally ok with this last result, but in case of more states this proceeding is not applicable, or just with a lot of effort. Thus, following questions come up: 1) Why is the transition matrix not “optimal” learned by the „from_sample“ method? Is there a “simple/general” reason? 2) how can the „baum-welch“ - fitting reducing the accuracy?
3) As described in #190.
I am gratefully thankful for any hint! :)