Open CatalysTim opened 4 years ago
@CatalysTim thanks for reporting this, and sorry for such a late reply. I've just seen your message.
First, for some reason data type for theta in this dataset is float64, however, the algorithm needs float32. So if you make theta=theta.astype('float32') the result will be non-zero. Second, yes the algorithm is memory consuming, and it is typically used for the time frames where motion artifacts exist in reconstructions by standard methods. For instance, if there are artifacts at time frames 30,..33 then it is enough to use the algo for 8 time frames (8 or 16 basis functions) around frame 32 or so. In your test I would consider larger chunk sizes (4-8) because the algorithm also performs 3D total variation affecting the derivative in z. I would also recommend to bin your data to twice smaller sizes in order to play with regularization parameters.
The following code works for me on a similar GPU, although I didn't play with regularization parameters. Let me know if you still have problems and I will look at this data more in detail.
import rectv_gpu
import numpy as np
import dxchange
import tomopy
import os
def takephi(ntheta, m):
[x, y] = np.meshgrid(np.arange(-ntheta//2, ntheta//2), np.arange(-m//2, m//2))
phi = np.zeros([m, 2*ntheta], dtype='float32')
phi[:, ::2] = np.cos(2*np.pi*x*y/ntheta)/np.sqrt(ntheta)
phi[:, 1::2] = np.sin(2*np.pi*x*y/ntheta)/np.sqrt(ntheta)
return phi
h5fname = "fuelcell_i2.h5"
nproj = 301
ntimesteps = 8
rot_center = 702
chunksize = 8
frame = 30
data_size = [18060, 1100, 1440]
# Select sinogram range to reconstruct.
sino_start = 0
sino_end = data_size[1]
# Split sinograms into chunks to reconstruct to accommodate limited RAM
chunks = int(data_size[1] / chunksize)
nSino_per_chunk = (sino_end - sino_start) / chunks
strt = 0
for iChunk in range(0, chunks):
sino_chunk_start = sino_start + nSino_per_chunk * iChunk
sino_chunk_end = sino_start + nSino_per_chunk * (iChunk+1)
if sino_chunk_end > sino_end:
break
sino = (int(sino_chunk_start), int(sino_chunk_end))
proj, flat, dark, theta = dxchange.read_aps_32id(h5fname, sino=sino)
if int(frame - ntimesteps)>0:
proj = proj[(frame - ntimesteps // 2) * nproj:(frame + ntimesteps // 2) * nproj, :, :]
data = tomopy.normalize(proj, flat, dark, cutoff=1.4)
data = tomopy.minus_log(data)
data = tomopy.remove_nan(data, val=0.0)
data = tomopy.remove_neg(data, val=0.00)
data[np.where(data == np.inf)] = 0.00
theta = np.linspace(0, np.pi * ntimesteps, nproj * ntimesteps, endpoint=False)
[ntheta, nz, n] = data.shape
lambda0 = pow(2,-9)
lambda1 = pow(2,2)
niter = 16
titer = 4
nzp = 2
ngpus = 1
m = ntimesteps # number of basis functions
# reorder input data to (nz, ntheta, n) for compatibility
data = data.swapaxes(0, 1)
# take basis functions
phi = takephi(ntheta,m)
theta = theta.astype('float32')
with rectv_gpu.Solver(n, ntheta, m, nz, nzp, ngpus) as cl:
rtv = cl.recon(data, theta, phi, rot_center=rot_center, lambda0=lambda0,
lambda1=lambda1, niter=niter, titer=titer)
for time_frame in range(0, m):
fname = os.path.dirname(os.path.abspath(h5fname)) + '/' + os.path.splitext(os.path.basename(h5fname))[0]+ '_rec_full/' + 'recon' + str(m) + '_'
dxchange.write_tiff_stack(rtv[:,time_frame], fname=fname, start=strt)
strt += sino[1] - sino[0]
Hi, I've been trying to make use of the package on the operando fuel cell dataset provided by TomoBank HERE, just below the foam data that was demonstrated in the Nikitin et al. paper. The dataset has the dimensions (ntheta, nz, n) = (18060, 1100, 1440), with ntheta = nprojections ntimesteps = 301 60. The sample recon script provided HERE circumvents limited RAM issues for the Tomopy/ASTRA Toolbox recon algorithms by processing the dataset in chunks of 8 sinograms/slices at a time, i.e. (18060, 8, 1440) per iteration. This has worked fine with the CUDA-accelerated iterative SIRT/MLEM algorithms.
I've adapted the now-deprecated rectv parts of script to the current build of rectv_gpu and am attempting to run it on a 16GB Tesla V100 (50+ GB of RAM). However, even down to a chunk size of 1, I'm still maxing out my GPU memory and getting a "Segmentation fault (core dumped)" error at the rectv_gpu.Solver.recon step, so I'm considering batching by time steps as well to further minimize the memory issue. Also, when I do get outputs, they're all-zeros or black output images. For reference, I've tested the foam data notebook example and it works fine.
Am I doing anything wrong in the workflow below? Would batching by time steps affect the performance of this algorithm if I technically have motion at all timesteps? And if not, any suggestions for what I should be doing instead?