jmschrei / pomegranate

Fast, flexible and easy to use probabilistic modelling in Python.
http://pomegranate.readthedocs.org/en/latest/
MIT License
3.35k stars 589 forks source link

[BUG]Speed issue #1075

Closed tanyasarkjain closed 2 months ago

tanyasarkjain commented 8 months ago

Describe the bug A clear and concise description of what the bug is, including what you were expecting to happen and what actually happened. Please report the version of pomegranate that you are using and the operating system. Also, please make sure that you have upgraded to the latest version of pomegranate before submitting the bug report.

I am using the latest version of pomegranate. My code is taking an incredibly slow amount of time to run, all I am doing is creating an uninitialized hmm and fitting it to my data. The model is 2 states, and the emissions are multivariate (2 features), that take on a range of about 20 numbers each. Furthermore, when I print out the predictions I am only getting a state assignment of 1. I tried it on just 2 iterations and it took about 6 minutes.

To Reproduce Please provide a snippet of code that can reproduce this error. It is much easier for us to track down bugs and fix them if we have an example script that fails until we're successful.

import pomegranate import seaborn; seaborn.set_style('whitegrid') import torch

https://pomegranate.readthedocs.io/en/latest/tutorials/B_Model_Tutorial_4_Hidden_Markov_Models.html#Initializing-Hidden-Markov-Models

print(pomegranate.version)

from pomegranate.hmm import DenseHMM

Here is a snippet of what mv_emissions looks like: [[[16, 11], [16, 12], [13, 12], [15, 12], [15, 9], [14, 6], [15, 3], [9, 6],]] Response time Although I will likely respond during weekdays if I am not on vacation, I am not likely to be able to merge PRs or write code until the weekend.

jmschrei commented 8 months ago

That doesn't sound right. Unfortunately, without code to check what's going on, it'll be difficult for me to provide feedback. Are you using a GPU? What happens if you set max_iter to be a small number? What is the shape of the data you're training on?

tanyasarkjain commented 8 months ago

Okay, now it is no longer taking as long, I finished an issue with the dimensions of my sequences. However I am getting 'nan' improvement now:

` from pomegranate.hmm import DenseHMM from pomegranate.distributions import Categorical

starts = [0.5, 0.5] d = Categorical().fit(all_seq_100_equal[1]) print('d', d.probs)

model = DenseHMM([d, d], starts = starts, max_iter=10, verbose=True) model.fit(all_seq_100_equal) print(np.array(all_seq_100_equal).shape) `

The output is: d Parameter containing: tensor([[0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0100, 0.0000, 0.0300, 0.0400, 0.0300, 0.1500, 0.1300, 0.2000, 0.2300, 0.1700, 0.0100], [0.0000, 0.0000, 0.0000, 0.0000, 0.0200, 0.0200, 0.4700, 0.0300, 0.0000, 0.0300, 0.2500, 0.0900, 0.0100, 0.0000, 0.0800, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000]]) [1] Improvement: nan, Time: 0.3021s [2] Improvement: nan, Time: 0.336s [3] Improvement: nan, Time: 0.3485s [4] Improvement: nan, Time: 0.3055s [5] Improvement: nan, Time: 0.299s [6] Improvement: nan, Time: 0.296s [7] Improvement: nan, Time: 0.3206s [8] Improvement: nan, Time: 0.32s [9] Improvement: nan, Time: 0.321s [10] Improvement: nan, Time: 0.629s (4550, 100, 2)

Furthermore, all the predictions have a value of 0, which is perhaps why there is nan improvement. The first element of all_seq_100_equal, for context into the type of data i'm working with, is: [[16, 1], [18, 10], [17, 7], [13, 9], [14, 1], [17, 14], [13, 10], [18, 10], [18, 9], [15, 9], [13, 15], [16, 1], [18, 9], [17, 1], [16, 14], [16, 14], [12, 1], [14, 4], [15, 5], [18, 6], [17, 4], [18, 14], [12, 9], [12, 11], [14, 11], [18, 9], [18, 15], [7, 14], [11, 15], [11, 1], [18, 7], [16, 6], [14, 6], [17, 3], [14, 1], [14, 10], [17, 11], [15, 14], [12, 11], [15, 11], [14, 10], [14, 11], [17, 6], [18, 11], [15, 1], [13, 3], [14, 2], [15, 3], [17, 11], [16, 11], [12, 6], [13, 2], [14, 3], [16, 2], [12, 11], [14, 6], [14, 1], [13, 11], [15, 2], [16, 1], [10, 15], [15, 12], [15, 6], [17, 7], [17, 6], [13, 1], [14, 2], [12, 6], [16, 2], [15, 2], [16, 11], [15, 1], [13, 10], [13, 7], [16, 10], [14, 7], [12, 10], [17, 10], [13, 2], [17, 11], [15, 11], [15, 7], [17, 9], [17, 7], [16, 10], [15, 11], [17, 1], [16, 14], [16, 12], [18, 10], [16, 8], [18, 11], [19, 10], [17, 11], [14, 10], [14, 7], [18, 10], [15, 7], [14, 5], [18, 11]]

jmschrei commented 6 months ago

When you pass in DenseHMM([d, d]) I'm pretty sure you're passing in the same object to each state and so both will always be identical. Try making two copies of the object?