pmelchior / pygmmis

Gaussian mixture model for incomplete (missing or truncated) and noisy data
MIT License
98 stars 22 forks source link

[Errno 12] Cannot allocate memory #7

Closed ravi0912 closed 5 years ago

ravi0912 commented 5 years ago

Hi, This library works fine when I try to run it for a single group of data for curve fitting. But when I iterate through multiple groups of data and try to fit the curve for each group, it works fine till ~300 iterations but after that, it shows #memory error. Any fixes for this, I know the problem is happening somewhere in multiprocessing.

ravi0912 commented 5 years ago

And I'm working with 32gm RAM, so memory should not be an issue.

pmelchior commented 5 years ago

Ravi, what do you mean by "iterate through multiple groups"? Are they used to constrain the same GMM? My suspicion is that you implicitly create a multiprocessing pool for each iteration for every group and that there's a memory leak somewhere.

ravi0912 commented 5 years ago

for key in df_desc_v1.unique_desc: --data = data.iloc[data_grpby[key]]['diff'] #1D data, we need to learn GMM for this --data = np.array([data]) --data = data.reshape(len(data[0]),1) --gmm = pygmmis_v1.GMM(K=4, D=1) # K components, D dimensions --gmm.amp = np.array([1/4,1/4,1/4,1/4]) --gmm.mean = np.array([[3],[10],[20],[25]]) --gmm.covar = np.array([[[3]],[[4]],[[4]],[[4]]]) --logL, U = pygmmis_v1.fit(gmm, data,init_method='none',frozen={"amp":[],"mean":[0,1,2,3],"covar":[]},w=0.5,tol = 1e-2)

Hi Peter, Thanks for replying!!! Every group is learning its own model. Above mentioned is the exact code I'm trying to run. So, let's say I have a key and each key have 1D data points(~10K points). After going through ~300 iterations it shows "Memory error". This is the only library I found which has the property of freezing the means.

pmelchior commented 5 years ago

I would do something like that

gmm = pygmmis_v1.GMM(K=4, D=1) # K components, D dimensions
for key in df_desc_v1.unique_desc:
    # not sure what going on in the next line, because this would overwrite data
    data = data.iloc[data_grpby[key]]['diff']
    data = np.array(data).reshape(-1,1)
    gmm.amp = np.array([1/4,1/4,1/4,1/4])
    gmm.mean = np.array([[3],[10],[20],[25]])
    gmm.covar = np.array([[[3]],[[4]],[[4]],[[4]]])
    logL, U = pygmmis_v1.fit(gmm, data, init_method='none', frozen={"amp":[],"mean":[0,1,2,3],"covar":[]}, w=0.5, tol = 1e-2)

Anyway, these are just minor modifications, this doesn't explain what's causing trouble. How many keys are in df_desc_v1.unique_desc and how large is each chunk of data? Also, what's going on in the first instruction in the for loop.

ravi0912 commented 5 years ago

Somehow, I figured out the problem. While training a GMM for a specific key, if a singular matrix problem occurs it raises an exception but it doesn't clear the memory and in my case, there are lots of keys which have singular matrix problem which bundles up and after some iterations, it throws "Cannnot allocate memory" error. For this, I just added try & exception in your module inside "def fit" and inside exception, I added pool.close() & pool.join(). Thanks for helping me out and for a quick reply!!!

pmelchior commented 5 years ago

This should indeed be properly fixed