I noticed that when using training HiddenMarkovModel with emissions composed by ExponentialDistribution wrapped in IndependentComponentsDistribution, the training was orders of magnitude slower than when using ExponentialDistribution directly. I was able to replicate the same behaviour also with NormalDistribution and so I narrowed it down to IndependentComponentsDistribution.
The slower speed is noticable only when using multithreading in fit (n_jobs > 1).
I was able to narrow down the cause to the function _log_probability of IndependentComponentsDistribution.
The following version of the function leads to slow behavior:
cdef void _log_probability(self, double* X, double* log_probability, int n) nogil:
cdef int i, j
cdef double logp
memset(log_probability, 0, n*sizeof(double))
for i in range(n):
for j in range(self.d):
if self.cython == 1:
(<Model> self.distributions_ptr[j])._log_probability(X+i*self.d+j, &logp, 1)
else:
with gil:
python_log_probability(self.distributions[j], X+i*self.d+j, &logp, 1)
log_probability[i] += logp * self.weights_ptr[j]
But the following is fast
cdef void _log_probability(self, double* X, double* log_probability, int n) nogil:
cdef int i, j
cdef double logp
memset(log_probability, 0, n*sizeof(double))
for i in range(n):
for j in range(self.d):
if True or self.cython == 1: # notice the True here
(<Model> self.distributions_ptr[j])._log_probability(X+i*self.d+j, &logp, 1)
else:
with gil:
python_log_probability(self.distributions[j], X+i*self.d+j, &logp, 1)
log_probability[i] += logp * self.weights_ptr[j]
This happens even though self.cython is always 1, and so the else block is never executed.
Moreover, this is also slow:
cdef void _log_probability(self, double* X, double* log_probability, int n) nogil:
cdef int i, j
cdef double logp
memset(log_probability, 0, n*sizeof(double))
for i in range(n):
for j in range(self.d):
if self.cython == 1 or True: # notice the True here with reverse order
(<Model> self.distributions_ptr[j])._log_probability(X+i*self.d+j, &logp, 1)
else:
with gil:
python_log_probability(self.distributions[j], X+i*self.d+j, &logp, 1)
log_probability[i] += logp * self.weights_ptr[j]
I think that branch in the for loop somehow makes it harder for cython to compile efficient code.
Here a complete reproducible example that elicits the slow behaviour:
The first fit call is slow with the original pomegranate code:
[1] Improvement: 8114.942703745794 Time (s): 6.926
[2] Improvement: 0.0011873976327478886 Time (s): 8.598
[3] Improvement: 1.3969838619232178e-09 Time (s): 8.284
[4] Improvement: -6.426125764846802e-08 Time (s): 8.033
[5] Improvement: 6.472691893577576e-08 Time (s): 8.972
[6] Improvement: -6.472691893577576e-08 Time (s): 8.011
[7] Improvement: 6.472691893577576e-08 Time (s): 7.855
[8] Improvement: -6.472691893577576e-08 Time (s): 8.366
[9] Improvement: 6.472691893577576e-08 Time (s): 8.269
[10] Improvement: -6.472691893577576e-08 Time (s): 8.243
Total Training Improvement: 8114.943891080562
Total Training Time (s): 88.2151
The second fit call without IndependentComponentsDistribution is fast:
[1] Improvement: 8111.520942713367 Time (s): 0.1579
[2] Improvement: 1.6850003311410546 Time (s): 0.1475
[3] Improvement: 0.00018647708930075169 Time (s): 0.1554
[4] Improvement: 1.3271346688270569e-08 Time (s): 0.1509
[5] Improvement: -1.234002411365509e-08 Time (s): 0.1648
[6] Improvement: 2.3283064365386963e-10 Time (s): 0.1637
[7] Improvement: 0.0 Time (s): 0.1708
[8] Improvement: 0.0 Time (s): 0.1509
[9] Improvement: 0.0 Time (s): 0.1576
[10] Improvement: 0.0 Time (s): 0.1464
Total Training Improvement: 8113.206129522761
Total Training Time (s): 1.7545
With the modifications in this commit, the slow fit call becomes fast:
[1] Improvement: 8114.942703745794 Time (s): 0.1773
[2] Improvement: 0.0011873976327478886 Time (s): 0.1736
[3] Improvement: 1.3969838619232178e-09 Time (s): 0.1783
[4] Improvement: -6.426125764846802e-08 Time (s): 0.1667
[5] Improvement: 6.472691893577576e-08 Time (s): 0.1846
[6] Improvement: -6.472691893577576e-08 Time (s): 0.1884
[7] Improvement: 6.472691893577576e-08 Time (s): 0.1826
[8] Improvement: -6.472691893577576e-08 Time (s): 0.1638
[9] Improvement: 6.472691893577576e-08 Time (s): 0.1868
[10] Improvement: -6.472691893577576e-08 Time (s): 0.1695
Total Training Improvement: 8114.943891080562
Total Training Time (s): 1.9820
Thank you for your contribution. However, pomegranate has recently been rewritten from the ground up to use PyTorch instead of Cython, and so this PR is unfortunately out of date and will be closed.
I noticed that when using training
HiddenMarkovModel
with emissions composed byExponentialDistribution
wrapped inIndependentComponentsDistribution
, the training was orders of magnitude slower than when usingExponentialDistribution
directly. I was able to replicate the same behaviour also withNormalDistribution
and so I narrowed it down toIndependentComponentsDistribution
.The slower speed is noticable only when using multithreading in fit (n_jobs > 1). I was able to narrow down the cause to the function
_log_probability
of IndependentComponentsDistribution. The following version of the function leads to slow behavior:But the following is fast
This happens even though
self.cython
is always 1, and so theelse
block is never executed. Moreover, this is also slow:I think that branch in the for loop somehow makes it harder for cython to compile efficient code. Here a complete reproducible example that elicits the slow behaviour:
The first fit call is slow with the original pomegranate code:
The second fit call without
IndependentComponentsDistribution
is fast:With the modifications in this commit, the slow fit call becomes fast: