shaunpwilkinson / aphid

Analysis with Profile Hidden Markov Models
21 stars 4 forks source link

Training with BaumWelch algorithm #2

Open maforinw opened 5 years ago

maforinw commented 5 years ago

Hi Shaun,

I would like to use a list of different sequences (e.g. obtained from direct observations) for parameter tuning (using the Baum-Welch algorithm) before using Viterbi algorithm in a routine process.

hmm.init

Hidden Markov model (object class: 'HMM') with 6 hidden states (sAlimente, inactif, explore, autres, seDeplace, seToilette) emitting 6 unique residues (sAlimenteE, inactifE, exploreE, autresE, seDeplaceE, seToiletteE).

When I tried to optimize it using a list of 1215 sequences from different lengths, this error occured: opt_hmm<-train(hmm.init,listSeqE, method="BaumWelch", maxiter = 500)

Iteration 1 log likelihood = -109417.7 Iteration 2 log likelihood = NaN Error in if (abs(LL - logPx) < deltaLL) { : missing value where TRUE/FALSE needed

shaunpwilkinson commented 5 years ago

Hi Marie-Amélie, Thanks very much for picking this up - it turns out there was a bug on line 672 of train.HMM causing very high expected counts in the emissions matrix due to a logsum operation on an empty vector when not all symbols were present in the training sequences. This in turn was causing the log likelihood to plummet out of range and return an error. I've fixed it now in the development version and will push the bug fix to CRAN later on today. If you install the development version by running devtools::install_github("shaunpwilkinson/aphid") it should work now.

Apologies for the inconvenience, and thanks again for picking this up and sending through your reproducible example! Cheers, Shaun