Closed Jieyu-Wang closed 6 years ago
Hi Jieyu,
I reproduced your problem. The root cause is in the first line of the traceback: it tells us that there's zero probability for some sample. This should be prevented by only evaluating the probability of each sample over the neighborhood H
. In your case (the default configuration), this neighborhood comprises all samples, including some that may have very low probability under any of the GMM components. So, you've got a numerical underflow. I suspect that this happens because some components shrink very strongly.
Use the argument cutoff=5
in fit
, which limits the evaluation to 5-sigma neighborhoods of each component (where each sigma refers to the covariance of the component). That prevents the underflow by ignoring samples that are too far away for any component and speeds up the process considerably. I just ran the full simu
with K=10
in 5 seconds.
Another option is increasing the minimum component covariance w
to something larger. 1e-6
doesn't help for the range your data cover.
Thanks for helping me out and for a quick reply!! Now it works fine for my model.
Hi Peter,
Thank you for developing the powerful tool. Unfortunately I have some issue while applying it to my data, and could you please take a look on it? My python version is 3.6.2 and all my site packages are up to date.
gmm_logL(simu,20)
The function gmm_logL is:
and the file for simu is as below: simulation100000.txt
you can simply use
simu = np.loadtext('simulation100000.txt')
to load it.I ran the code as above in ipython console and the trace back log is as below:
above are the runtime warnings I've got while running gmm_logL(simu,20), and it stopped at here seems to fall into a infinite loop for more than 2 hour. So I pressed ctrl+c to stop it: and it didn't stop directly but stop after generating these information:
Thank you for your time and patience!