Closed ravi0912 closed 5 years ago
And I'm working with 32gm RAM, so memory should not be an issue.
Ravi, what do you mean by "iterate through multiple groups"? Are they used to constrain the same GMM? My suspicion is that you implicitly create a multiprocessing pool for each iteration for every group and that there's a memory leak somewhere.
for key in df_desc_v1.unique_desc:
--data = data.iloc[data_grpby[key]]['diff'] #1D data, we need to learn GMM for this
--data = np.array([data])
--data = data.reshape(len(data[0]),1)
--gmm = pygmmis_v1.GMM(K=4, D=1) # K components, D dimensions
--gmm.amp = np.array([1/4,1/4,1/4,1/4])
--gmm.mean = np.array([[3],[10],[20],[25]])
--gmm.covar = np.array([[[3]],[[4]],[[4]],[[4]]])
--logL, U = pygmmis_v1.fit(gmm, data,init_method='none',frozen={"amp":[],"mean":[0,1,2,3],"covar":[]},w=0.5,tol = 1e-2)
Hi Peter, Thanks for replying!!! Every group is learning its own model. Above mentioned is the exact code I'm trying to run. So, let's say I have a key and each key have 1D data points(~10K points). After going through ~300 iterations it shows "Memory error". This is the only library I found which has the property of freezing the means.
I would do something like that
gmm = pygmmis_v1.GMM(K=4, D=1) # K components, D dimensions
for key in df_desc_v1.unique_desc:
# not sure what going on in the next line, because this would overwrite data
data = data.iloc[data_grpby[key]]['diff']
data = np.array(data).reshape(-1,1)
gmm.amp = np.array([1/4,1/4,1/4,1/4])
gmm.mean = np.array([[3],[10],[20],[25]])
gmm.covar = np.array([[[3]],[[4]],[[4]],[[4]]])
logL, U = pygmmis_v1.fit(gmm, data, init_method='none', frozen={"amp":[],"mean":[0,1,2,3],"covar":[]}, w=0.5, tol = 1e-2)
Anyway, these are just minor modifications, this doesn't explain what's causing trouble. How many keys are in df_desc_v1.unique_desc
and how large is each chunk of data
? Also, what's going on in the first instruction in the for loop.
Somehow, I figured out the problem. While training a GMM for a specific key, if a singular matrix problem occurs it raises an exception but it doesn't clear the memory and in my case, there are lots of keys which have singular matrix problem which bundles up and after some iterations, it throws "Cannnot allocate memory" error. For this, I just added try & exception in your module inside "def fit" and inside exception, I added pool.close() & pool.join(). Thanks for helping me out and for a quick reply!!!
This should indeed be properly fixed
Hi, This library works fine when I try to run it for a single group of data for curve fitting. But when I iterate through multiple groups of data and try to fit the curve for each group, it works fine till ~300 iterations but after that, it shows #memory error. Any fixes for this, I know the problem is happening somewhere in multiprocessing.