tomopy / tomopy

Tomographic Reconstruction in Python
http://tomopy.readthedocs.org
Other
323 stars 276 forks source link

running tomopy on machines with limited RAM #31

Closed decarlof closed 9 years ago

decarlof commented 10 years ago

via e-mail from Matt (Matthew Giarra [matthew.giarra@gmail.com])

One issue that I haven't figure out how to solve myself is that our machine runs out of memory when trying to reconstruct all ~1000 slices from an image set. We could just buy more memory, but I think it would also be useful to be able to run the code on regular consumer machines with less memory (8-16 GB). I guess the software fix for this "issue" would be to read sections of the HDF file one at a time and then write slices out individually, rather than storing everything in memory and then writing it all out at the end of processing. All of this reading/writing would probably be slower than just loading everything in at once, but I think it's ok to sacrifice some speed in exchange for compatibility with lower-memory machines.

Do you know if this is already implemented in the code somewhere, and I just haven't figured out how to implement it? If it's not already implemented, is this something that you guys are working on?

Thanks again for your continued help :)

xianghuix commented 10 years ago

I have script do exactly what you want.

Thanks,

Xianghui


From: Francesco De Carlo [notifications@github.com] Sent: Friday, May 30, 2014 1:04 PM To: tomopy/tomopy Subject: [tomopy] running tomopy on machines with limited RAM (#31)

via e-mail from Matt (Matthew Giarra [matthew.giarra@gmail.commailto:matthew.giarra@gmail.com])

One issue that I haven't figure out how to solve myself is that our machine runs out of memory when trying to reconstruct all ~1000 slices from an image set. We could just buy more memory, but I think it would also be useful to be able to run the code on regular consumer machines with less memory (8-16 GB). I guess the software fix for this "issue" would be to read sections of the HDF file one at a time and then write slices out individually, rather than storing everything in memory and then writing it all out at the end of processing. All of this reading/writing would probably be slower than just loading everything in at once, but I think it's ok to sacrifice some speed in exchange for compatibility with lower-memory machines.

Do you know if this is already implemented in the code somewhere, and I just haven't figured out how to implement it? If it's not already implemented, is this something that you guys are working on?

Thanks again for your continued help :)

— Reply to this email directly or view it on GitHubhttps://github.com/tomopy/tomopy/issues/31.

dgursoy commented 10 years ago

This is actually doable by defining the slices_start and slices_end arguments when you load data using tomopy.xtomo_reader(). One important point is that you may want to use x_start argument for tomopy.xtomo_writer() for saving data with correct indexing. Something like that would do the job to avoid memory problems:

# -*- coding: utf-8 -*-
import tomopy

for m in range(16):
    # Read HDF5 file in chunks.
    data, white, dark, theta = tomopy.xtomo_reader('demo/data.h5',
                                             slices_start=m*128, slices_end=(m+1)*128)

    # Xtomo object creation and pipeline of methods.  
    d = tomopy.xtomo_dataset(log='debug')
    d.dataset(data, white, dark, theta)
    d.normalize()
    d.center=661.5
    d.gridrec()

    # Write to stack of TIFFs.
    tomopy.xtomo_writer(d.data_recon, 'tmp/test_', axis=0,
                                      x_start=m*128)
xianghuix commented 10 years ago

It is a little more than that when phase retrieval is applied before tomography reconstruction. The slices on the beginning and end have different gray value levels from all other slices in the middle. When you do reconstructions chunk by chunk this should be considered. In the script in handyn I have script can handle these issues.

dgursoy commented 10 years ago

How about this then:

# -*- coding: utf-8 -*-
import tomopy

for m in range(16):
    # Read HDF5 file in chunks.
    data, white, dark, theta = tomopy.xtomo_reader('demo/data.h5',
                                             slices_start=m*128, slices_end=(m+1)*132)

    # Xtomo object creation and pipeline of methods.  
    d = tomopy.xtomo_dataset(log='debug')
    d.dataset(data, white, dark, theta)
    d.normalize()
    d.center=661.5
    d.gridrec()

    # Write to stack of TIFFs.
    tomopy.xtomo_writer(d.data_recon[:-4:,:,:], 'tmp/test_', axis=0,
                                      x_start=m*128)
xianghuix commented 10 years ago

I use this

num_slices = 2048 chunk_size = 400 margin_slices = 50 num_chunk = np.int(num_slices/chunk_size) + 1 if num_slices == chunk_size: num_chunk = 1

z = 3.0 eng = 28. pxl = 0.65e-4 rat = 1.5e-03

----------------- finding center

data, white, dark, theta = tomopy.xtomo_reader(file_name,slices_start=1000,slices_end=1020,white_start=3,white_end=9,dark_start=3,dark_end=9)

data[0,:,:] = data[1,:,:]

d = tomopy.xtomo_dataset(log='debug')

d.dataset(data, white, dark, theta)

data_size = d.data.shape

d.median_filter(10)

d.optimize_center()

d.diagnose_center(dir_path=data_dir+'/data_center/',center_start=data_size[2]/2-100,center_end=data_size[2]/2+100,center_step=0.5)

center = d.center

center = 1300.5

fn = os.path.basename(file_name)

output_file = outputdir+'/recon'+fn.split("")[0]+'/recon'+file_basename

for ii in xrange(num_chunk): if ii == 0: SliceStart = ii_chunk_size SliceEnd = (ii+1)_chunk_size else: SliceStart = ii*(chunk_size-margin_slices) SliceEnd = SliceStart + chunk_size if SliceEnd > num_slices: SliceEnd = num_slices

data, white, dark, theta = tomopy.xtomo_reader(file_name,slices_start=SliceStart,slices_end=SliceEnd,white_start=3,white_end=9,dark_start=3,dark_end=9)
d.dataset(data, white, dark, theta)
d.normalize()
#d.correct_drift(10)
d.stripe_removal(wname="sym16",level=8,sigma=1)
d.phase_retrieval(dist=z, energy=eng, pixel_size=pxl, alpha=rat,padding=True)
d.center = center 
d.gridrec()

tomopy.xtomo_writer(d.data_recon[margin_slices:,:,:], output_file, axis=0,x_start=SliceStart+np.int(margin_slices/2),overwrite = True)
d.FLAG_THETA = False
decarlof commented 10 years ago

nice! what about adding them as demo in the next release?

dgursoy commented 10 years ago

Yes, it is actually on the task list. Rather than changing the script, I want it actually to be calculated in the background based on the available memory and the data sizes so that user would not feel the difference.

resilver commented 10 years ago

During each iteration of the for loop, the solution proposed by xianghuix will delete all files in the output file directory and then export a chunck of slices. In order to export slices to the output directory without deleting previously reconstructed slices, the overwrite input of tomopy.xtomo_writer() should be changed from True to False so that the line reads:

tomopy.xtomo_writer(d.data_recon[margin_slices:,:,:], output_file, axis=0,x_start=SliceStart+np.int(margin_slices/2),overwrite = False)

xianghuix commented 10 years ago

Okay I modified xtomo_writer in my local version so my solution works. In the standard release overwrite = True will delete the entire directory and then create it again. In my modification the routine only overwrite the files with same names. This is overwrite in its real meaning. Doga, you may like to adopt this change.

Thanks,

Xianghui


From: resilver [notifications@github.com] Sent: Tuesday, June 10, 2014 5:17 PM To: tomopy/tomopy Cc: Xiao, Xianghui Subject: Re: [tomopy] running tomopy on machines with limited RAM (#31)

During each iteration of the for loop, the solution proposed by xianghuix will delete all files in the output file directory and then export a chunck of slices. In order to export slices to the output directory without deleting previously reconstructed slices, the overwrite input of tomopy.xtomo_writer() should be changed from True to False so that the line reads:

tomopy.xtomo_writer(d.data_recon[margin_slices:,:,:], output_file, axis=0,x_start=SliceStart+np.int(margin_slices/2),overwrite = False)

— Reply to this email directly or view it on GitHubhttps://github.com/tomopy/tomopy/issues/31#issuecomment-45679301.

xianghuix commented 10 years ago

At least by the time I made change, overwirte=False would rename the output files with same basename by attaching some version number on the end. I change it in my modification.

Thanks,

Xianghui


From: resilver [notifications@github.com] Sent: Tuesday, June 10, 2014 5:17 PM To: tomopy/tomopy Cc: Xiao, Xianghui Subject: Re: [tomopy] running tomopy on machines with limited RAM (#31)

During each iteration of the for loop, the solution proposed by xianghuix will delete all files in the output file directory and then export a chunck of slices. In order to export slices to the output directory without deleting previously reconstructed slices, the overwrite input of tomopy.xtomo_writer() should be changed from True to False so that the line reads:

tomopy.xtomo_writer(d.data_recon[margin_slices:,:,:], output_file, axis=0,x_start=SliceStart+np.int(margin_slices/2),overwrite = False)

— Reply to this email directly or view it on GitHubhttps://github.com/tomopy/tomopy/issues/31#issuecomment-45679301.

dgursoy commented 10 years ago

overwrite=False is working as expected but overwrite=True option can be modified such that it overwrites only the conflicting files rather than deleting the whole directory.

xianghuix commented 10 years ago

That is what i did. You can check the code in beamline computer for it.

Thanks,

Xianghui

On Jun 10, 2014 7:55 PM, dgursoy notifications@github.com wrote:

overwrite=False is working as expected but overwrite=True option can be modified such that it overwrites only the conflicting files rather than deleting the whole directory.

— Reply to this email directly or view it on GitHubhttps://github.com/tomopy/tomopy/issues/31#issuecomment-45690596.