rg314 / pytraction

Bayesian Traction Force Microscopy
BSD 3-Clause "New" or "Revised" License
3 stars 1 forks source link

MemoryError: Unable to allocate 91.6 GiB for an array with shape (78408, 78408) and data type complex128 #30

Open a19s opened 3 years ago

a19s commented 3 years ago

Running line: log2 = traction_obj.process_stack(img, ref, roi=roi, verbose=1) in the usage.ipynb notebook on Windows10 (after correcting for the shapely issue #29) gives this error

rg314 commented 3 years ago

Also linked to #18

rg314 commented 3 years ago

I will likely use h5 files as a replacement. Any thoughts?

a19s commented 3 years ago

Can you please elaborate on where the issue is likely to come from? Why does the analysis go from 80MB (initial size of the image) to 90GB?

rg314 commented 3 years ago

Running line: log2 = traction_obj.process_stack(img, ref, roi=roi, verbose=1) in the usage.ipynb notebook on Windows10 (after correcting for the shapely issue #29) gives this error

@andimi please could you post the full error message?

rg314 commented 3 years ago

Can you please elaborate on where the issue is likely to come from? Why does the analysis go from 80MB (initial size of the image) to 90GB?

Because the stack is is processed in a for loop and appended to a list. For each time point in the stack a number of images are appended to this list (grayscale, tfm map, force field, roi). I'm still surprised it's getting up to 90 GB the for loop needs to be written to write to a h5 file.

i.e. in psudo code it would be on the lines of:

# pytraction/core.py

class TractionForce(object):
    ....

    def _log_to_hdf5(target_path, log):
         """
         converts defaultdict or dictionary to hdf5 file and returns path.
         """
         pass

    def _process_stack(self, img_stack, ref_stack, bead_channel=0, roi=False, frame=[], crop=False):
            ...

            log['frame'].append(frame)
            log['traction_map'].append(traction_map)
            log['force_field'].append(f_n_m)
            log['stack_bead_roi'].append(stack)
            log['cell_roi'].append(cell_img_crop)
            log['mask_roi'].append(mask_crop)
            log['beta'].append(beta)
            log['L'].append(L_optimal)
            log['pos'].append(pos)
            log['vec'].append(vec)
            log['img_path'].append(self.img_path)
            log['ref_path'].append(self.ref_path)
            log['E'].append(self.E)
            log['s'].append(self.s)

            ...
            # code
            self._log_to_hdf5(target_path=path, log)
a19s commented 3 years ago

Running line: log2 = traction_obj.process_stack(img, ref, roi=roi, verbose=1) in the usage.ipynb notebook on Windows10 (after correcting for the shapely issue #29) gives this error

@andimi please could you post the full error message?


| STARTING |

('ITERATION # ', 0)

..[DONE] (' --residual : ', 1.0000000335136763) Starting validation.. ('Validation, iteration number ', 0)

('Validation, iteration number ', 1)

('Validation, iteration number ', 2)

..[DONE]

////////////////////////////////////////////////////////////////// end of iterative process.. Re-arranging vector fields.. ...[DONE]

('[DONE] ..after ', -21.92171573638916, 'seconds ')

Optimizing Lambda

Func-count x f(x) Procedure 1 381966 -235958 initial 2 618034 -227397 golden 3 236068 -245662 golden 4 145898 -255687 golden 5 90169.9 -265195 golden 6 55728.1 -273609 golden 7 34441.9 -280693 golden 8 21286.2 -286438 golden 9 13155.6 -290940 golden 10 8130.62 -294308 golden 11 5025 -296628 golden 12 3105.62 -297940 golden 13 1919.38 -298233 golden 14 1632.54 -298094 parabolic 15 2264.24 -298250 parabolic 16 2127.11 -298258 parabolic 17 2138.14 -298258 parabolic 18 2131.25 -298258 parabolic 19 2131.33 -298258 parabolic 20 2131.32 -298258 parabolic 21 2131.32 -298258 parabolic


MemoryError Traceback (most recent call last)

in ----> 1 log2 = traction_obj.process_stack(img, ref, roi=roi, verbose=1) ~\Miniconda3\envs\pytraction\lib\site-packages\pytraction\core.py in process_stack(self, img_stack, ref_stack, bead_channel, roi, frame, crop, verbose) 285 output = self._process_stack(img_stack, ref_stack, bead_channel, roi, frame, crop) 286 elif verbose == 1: --> 287 output = self._process_stack(img_stack, ref_stack, bead_channel, roi, frame, crop) 288 return output 289 ~\Miniconda3\envs\pytraction\lib\site-packages\pytraction\core.py in _process_stack(self, img_stack, ref_stack, bead_channel, roi, frame, crop) 321 322 # compute traction map --> 323 traction_map, f_n_m, L_optimal = self.TFM_obj.calculate_traction_map(pos, vec, beta) 324 325 log['frame'].append(frame) ~\Miniconda3\envs\pytraction\lib\site-packages\pytraction\traction_force.py in calculate_traction_map(self, pos, vec, beta) 19 20 # get lambda from baysian bad boi ---> 21 L, evidencep, evidence_one = optimal_lambda(beta, fuu, Ftux, Ftuy, 1, self.s, self.meshsize, i_max, j_max, X, 1) 22 23 # do the TFM with bays lambda ~\Miniconda3\envs\pytraction\lib\site-packages\pytraction\optimal_lambda.py in optimal_lambda(beta, fuu, Ftux, Ftuy, E, s, cluster_size, i_max, j_max, X, sequence) 56 target = partial(minus_logevidence, beta=beta, C_a=C_a, BX_a=BX_a, X=X, fuu=fuu, constant=constant, Ftux=Ftux,Ftuy=Ftuy,E=E,s=s,cluster_size=cluster_size,i_max=i_max, j_max=j_max) 57 start = time.time() ---> 58 alpha_opt = optimize.fminbound(target, alpha1, alpha2, disp=3) 59 end = time.time() 60 print(f'Time taken {end-start} s') ~\Miniconda3\envs\pytraction\lib\site-packages\scipy\optimize\optimize.py in fminbound(func, x1, x2, args, xtol, maxfun, full_output, disp) 1907 'disp': disp} 1908 -> 1909 res = _minimize_scalar_bounded(func, (x1, x2), args, **options) 1910 if full_output: 1911 return res['x'], res['fun'], res['status'], res['nfev'] ~\Miniconda3\envs\pytraction\lib\site-packages\scipy\optimize\optimize.py in _minimize_scalar_bounded(func, bounds, args, xatol, maxiter, disp, **unknown_options) 2009 si = np.sign(rat) + (rat == 0) 2010 x = xf + si * np.maximum(np.abs(rat), tol1) -> 2011 fu = func(x, *args) 2012 num += 1 2013 fmin_data = (num, x, fu) ~\Miniconda3\envs\pytraction\lib\site-packages\pytraction\optimal_lambda.py in minus_logevidence(alpha, beta, C_a, BX_a, X, fuu, constant, Ftux, Ftuy, E, s, cluster_size, i_max, j_max) 22 A = alpha*csr_matrix(C_a) + BX_a 23 ---> 24 L = sparse_cholesky(csr_matrix(A)).toarray() 25 logdetA = 2*np.sum(np.log(np.diag(L))) 26 ~\Miniconda3\envs\pytraction\lib\site-packages\scipy\sparse\compressed.py in toarray(self, order, out) 1029 if out is None and order is None: 1030 order = self._swap('cf')[0] -> 1031 out = self._process_toarray_args(order, out) 1032 if not (out.flags.c_contiguous or out.flags.f_contiguous): 1033 raise ValueError('Output array must be C or F contiguous') ~\Miniconda3\envs\pytraction\lib\site-packages\scipy\sparse\base.py in _process_toarray_args(self, order, out) 1200 return out 1201 else: -> 1202 return np.zeros(self.shape, dtype=self.dtype, order=order) 1203 1204 MemoryError: Unable to allocate 91.6 GiB for an array with shape (78408, 78408) and data type complex128 ```
a19s commented 3 years ago

OK very surprising: I run the notebook 3 times and only got the error the 2nd time. It seems that it's not "reliably crashing", so it might be due to some other issues than the creation of the pandas DataFrame?

a19s commented 3 years ago

Do you have an ROI for the file?

Yes. This error came up when I was running the base notebook to check it works on Windows - I have the roi file in the example2 folder.

rg314 commented 3 years ago

Yes, this is another problem now I've seen the error message.

Is this example 2 in the usage notebook? If you please could you try and run:

log2 = traction_obj.process_stack(img, ref, roi=roi, verbose=1, crop=True)

a19s commented 3 years ago

Yes, this is another problem now I've seen the error message.

Is this example 2 in the usage notebook? If you please could you try and run:

log2 = traction_obj.process_stack(img, ref, roi=roi, verbose=1, crop=True)

I ran that 3 times (with Restart Kernel and Run all cells) and it worked 100% of the time with no issues (and super fast!) :)

rg314 commented 3 years ago

Great cool. Need to think about tests for this and that the default value is for the crop parameter and some restriction on large images. From the error message, we first have to do some maths on a gigantic matrix but it's possible efficiently as we can use a spares matrix but when we convert this to an array it's gigantic . I wonder if it's possible to work out the maximum input image size or adjust the mesh for the problem accordingly?