hammerlab / flowdec

TensorFlow Deconvolution for Microscopy Data
Apache License 2.0
86 stars 26 forks source link

How to use shared RAM+VRAM #39

Closed joaomamede closed 3 years ago

joaomamede commented 3 years ago

I managed to do use the system RAM to complement the VRAM (it also filled my GPU mem usage), however it is not worth it versus splitting the images using dask to do map_overlaps in 100% GPU/VRAM as the transfer is much slower.

I am under the impression that it uses the shared memory if needed (when I split the image in 4 with dask as usual, with the same configProto shared settings, I had normal processing speed).

If you really can't do map_overlaps and you don't have enough VRAM, or for reference:

I don't know how the speed compares with plain CPU computing, but when it uses less shared memory it is considerably faster than when it needs a big chunk or RAM (I tested by not splitting my 2048x2044 (much, much slower), splitting my image in 2 (considerably faster and only used about 2G of RAM for the processing) or splitting in 4 (using no RAM and full speed)

from tensorflow.compat.v1 import ConfigProto
from tensorflow.compat.v1 import InteractiveSession

config = ConfigProto()
# choosing 2 forces tensorflow to use RAM together with allow_growth
config.gpu_options.per_process_gpu_memory_fraction = 2
config.gpu_options.allow_growth = True

algo = fd_restoration.RichardsonLucyDeconvolver(n_dims=psfgfp.ndim
                                                , pad_mode='2357'
#                                                     , pad_mode='none'
                                                ,pad_min=(0,1,1)
    tmp = algo.run(fd_data.Acquisition(data=chunk, kernel=psf)
                                ,niter=20
                                , session_config=config
                               )                                               ).initialize()