scopetools / cudasirecon

3D Structured illumination microscopy (3D-SIM) reconstruction software
GNU General Public License v3.0
28 stars 9 forks source link

Getting a "!!Error occurred: CUFFT plan creation failed" #27

Open walidabualafia opened 1 month ago

walidabualafia commented 1 month ago

Hi all,

Thank you so much for your work on this package.

I just pulled the package and installed, and I am trying to run the following line, but it keeps running into an error:

cudasirecon   --input-file ~/simtest/src/ --output-file simtest --otf-file ~/simtest/psf/488_xscan_SIMPSF__total_PSF_V2Hex.tif --nphases 5 --ndirs 1 --bessel --zoomfact 1.5 --ls .8 --na 1.0 --nimm 1.33 --angle0 -1.57 --otfRA --wiener 0.001 --background 100 --xyres 0.108 --zres 0.3 --besselNA 0.45 --deskew 32.8 --besselExWave .488 --gammaApo .001  

Whenever I run this line, execution halts when the code gets to cuFFT section. When my teammate traced it with GDB, and found that there is a segmentation fault.

The error that gets written to stdout is:

...
zdistcutoff[1]=147
zdistcutoff[2]=147
moving centerband
Before fftplan3d 56059MB free
Error code: 700
ptr_: 23422402822144
Error code: 700
ptr_: 23427368878080
Error code: 700
ptr_: 23401632628736
Error code: 700
ptr_: 23399418036224
Error code: 700
ptr_: 23397203443712
Error code: 700
ptr_: 23394988851200
Error code: 700
ptr_: 23392774258688
Error code: 700
ptr_: 23437960544256
Error code: 700
ptr_: 23437960806400
Error code: 700
ptr_: 23437959757824

!!Error occurred: CUFFT plan creation failed

Has anyone seen this error? Could the code be trying to access illegal memory regions?

Any help is appreciated! :)

Thank you, Walid

tlambert03 commented 1 month ago

How big is your image volume and how much GPU ram do you have available?

walidabualafia commented 1 month ago

My --input-file directory is 7.7G and my --otf-file is 43M.

I have 80GB VRAM (running on A100).

Thank you!

tlambert03 commented 1 month ago

oh ok, should definitely be more than sufficient.
This sort of thing can be pretty hard to debug unfortunately. Could you try reconstructing the test data in https://github.com/scopetools/cudasirecon/tree/main/test_data (see config files in the same directory) just to ensure that the package itself is installed and working ok? If so, we can try to determine what might be different about the data you're reconstructing

walidabualafia commented 1 month ago

Thanks, @tlambert03. We can confirm that the test data is working ok.

The problem might be with the data construction. I will talk to the data owners to see what we can share about the data reconstruction.

Thank you! :)

dan-alford commented 1 month ago

I am @walidabualafia colleague. The test data did run successfully but the resultant image does not look correct. Left is the raw image from the test data set, right is the processed image

Screenshot 2024-09-30 at 1 43 57 PM

This was ran using the following command

cudasirecon --config cudasirecon/test_data/config-tiff --otf-file cudasirecon/test_data/otf.tif --input-file cudasirecon/test_data/ --output-file raw

Cudasirecon was installed via conda.

linshaova commented 1 month ago

Hi dan,

Did you scroll to Z slices 4 or 5 and see if it makes more sense? Slice #1 usually doesn't show anything useful because it's out of focus, especially when the contrast is stretched between min and max intensities.

-lin

dan-alford commented 1 month ago

@linshaova Here are screenshots from 4 & 5

Screenshot 2024-09-30 at 3 19 49 PM Screenshot 2024-09-30 at 3 20 03 PM
linshaova commented 1 month ago

I could duplicate what you got, @dan-alford.

@tlambert03, sorry I never tested this test_data before. In cudarecon's log printout, the "modamp" numbers are suspiciously large (~8 for order 2, and ~18 for order 1 except for dir 0). See attached log. Has anything changed?

-lin cudasirecon_log.txt

linshaova commented 1 month ago

Ah, I know what's going on now. You should change the line fastSI=0 from the `config-tiff`` file. This option is referring to how a 3D SIM stack data is organized: fastSI=1 means taking all 15 images at one Z slice before moving to the next slice, whereas fastSI=0 means taking one Z stack for one SIM orientation and the next orientation. The test data is organized in the latter way, and therefore that flag should be set to 0.

@tlambert03 could you make that change in the git? thanks!

tlambert03 commented 1 month ago

Argh, will have to check the git history and repeat locally

linshaova commented 1 month ago

See my last comment just as your comment came out. I just created a PR for the updated config-tiff.

tlambert03 commented 1 month ago

truly baffled as to how that was wrong. but don't have time at the moment to sleuth it out at the moment. you can confirm that switching to fastSI=0 gives the expected output?

linshaova commented 1 month ago

Yes (sorry I forgot to mention that!)

dan-alford commented 1 month ago

That solved one of our issues. We are seeing issues when we run a reconstruction on a known good file that works with an older version of cudasirecon (cuda9) but is showing issues on the new code. The image on the left is the src image, middle image was converted using older version of cudasirecon, and right is the new version.

Screenshot 2024-10-02 at 4 52 06 PM

The older version was run using Omero with the following properties:

helper.set_zoom(1.5) helper.set_zres(0.2) helper.set_nimm(1.33) helper.set_deskew(32.8) helper.add_gamma(0.7) helper.add_ls(0.5015) helper.add_bex(0.488) helper.add_wiener(0.001) helper.add_background(150) helper.set_bessel_na(0.511) helper.setOtf('488OTFTHEORY.tif")

This was run on the new version with the following parameters:

cudasirecon --input-file ~/unit/ --output-file unit --nphases 5 --ndirs 1 --bessel --zoomfact 1.5 --ls 0.5015 --nimm 1.33 --wiener 0.001 --background 150 --zres 0.2 --besselNA 0.511 --deskew 32.8 --besselExWave 0.488 --gammaApo .07 --otf-file 488OTFTHEORY.tif

Is there a parameter that has changed or needs to be added?

linshaova commented 1 month ago
  1. --ndirs 1 is suspicious
  2. Is there no need for --k0angles?
tlambert03 commented 1 month ago

looks like lattice SIM data probably right?

sorry to hear there's been a breaking change in there @dan-alford, that wasn't the intention of course. Can you help narrow down exactly what the older version was? can you run cudasirecon --version?

linshaova commented 1 month ago

Hi @dan-alford , when you say "old version", how old are you talking about? If it was really old (~10 years), then there was a major change in how an OTF TIFF file is organized (related to how complex numbers are represented) since about 8 years ago. If you were using OTF created long time ago, then it may be the reason it doesn't work with the latest version. In that case, you would need to re-generate the OTF file.

dan-alford commented 1 month ago

the --version flag was not recognized. the --help gave the following.

--input-file arg input file (or data folder in TIFF mode) --output-file arg output file (or filename pattern in TIFF mode) --otf-file arg OTF file --ndirs arg (=3) number of directions --nphases arg (=5) number of phases per direction --nordersout arg (=0) number of output orders; must be <= norders --angle0 arg (=1.648) angle of the first direction in radians --ls arg (=0.172000006) line spacing of SIM pattern in microns --na arg (=1.20000005) Detection numerical aperture --nimm arg (=1.33000004) refractive index of immersion medium --zoomfact arg (=2) lateral zoom factor --explodefact arg (=1) artificially exploding the reciprocal-space distance between orders by this factor --zzoom arg (=1) axial zoom factor --nofilteroverlaps [=arg(=0)] do not filter the overlaping region between bands usually used in trouble shooting --background arg (=0) camera readout background --wiener arg (=0.00999999978) Wiener constant --wienerInr arg (=0.00999999978) Wiener constant increment --forcemodamp arg modamps forced to these values --k0angles arg user given pattern vector k0 angles for all directions --otfRA [=arg(=1)] using rotationally averaged OTF --k0searchAll [=arg(=0)] search for k0 at all time points --equalizez [=arg(=1)] bleach correcting for z --equalizet [=arg(=1)] bleach correcting for time --dampenOrder0 [=arg(=1)] dampen order-0 in final assembly --nosuppress [=arg(=0)] do not suppress DC singularity in final assembly (good idea for 2D/TIRF data) --nokz0 [=arg(=1)] do not use kz=0 plane of the 0th order in the final assembly --gammaApo arg (=1) output apodization gamma; 1.0 means triangular apo --saveprefiltered arg save separated bands (half Fourier space) into a file and exit --savealignedraw arg save drift-fixed raw data (half Fourier space) into a file and exit --saveoverlaps arg save overlap0 and overlap1 (real-space complex data) into a file and exit -c [ --config ] arg name of a file of a configuration. --2lenses [=arg(=1)] I5S data --bessel [=arg(=1)] bessel-SIM data --besselExWave arg (=0.488000005) Bessel SIM excitation wavelength in microns --besselNA arg (=0.143999994) Bessel SIM excitation NA) --deskew arg (=0) Deskew angle; if not 0.0 then perform deskewing before processing --deskewshift arg (=0) If deskewed, the output image's extra shift in X (positive->left) --noRecon No reconstruction will be performed; useful when combined with --deskew --cropXY arg (=0) Crop the X-Y dimension to this number; 0 means no cropping --xyres arg (=0.100000001) x-y pixel size (only used for TIFF files) --zres arg (=0.143999994) z pixel size (only used for TIFF files) --wavelength arg (=530) emission wavelength (only used for TIFF files) -h [ --help ] produce help message

tlambert03 commented 1 month ago

thanks that helps. (it's newer than may 2020 https://github.com/scopetools/cudasirecon/pull/4 but older than june 2021 https://github.com/scopetools/cudasirecon/pull/11)

linshaova commented 1 month ago

At least in the past, even in LLSM-SIM mode --k0angles is needed, such as --k0angles 1.57.

Could you share the printout cudasirecon produced when processing the above dataset you showed?

dan-alford commented 1 month ago

@linshaova - which version ? the old or the new? or both?

linshaova commented 1 month ago

both please (with the command line included)

dan-alford commented 1 month ago

new version

omero@nodegpu234 ~]$ cudasirecon --input-file ~/unit/ --output-file unit --otf-file 488OTFTHEORY.tif --ls 0.501500 --ndirs 1 --bessel --zoomfact 1.500000 --na 1.100000 --nimm 1.330000 --angle0 -1.570000 --otfRA --wiener 0 .001000 --background 150.000000 --xyres 0.104000 --zres 0.200000 --besselNA 0.511000 --besselExWave 0.488000 --gammaApo 0.700000 --deskew 32.800000 --nphases 5 wiener=0.001 gamma = 0.7 nphases=5, ndirs=1 nx_raw=512, ny=512, nz=85 nx=648, ny=512, nz=85, nz0 = 85, nwaves=1 dxy=0.104000, dz=0.108342 um nphases=5, norders=3, ndirs=1 nzotf=32, dkzotf=0.208333, nxotf=33, nyotf=1, dkrotf=0.150240 In makematrix. Separation matrix: 1.00000 1.00000 1.00000 1.00000 1.00000 1.00000 0.30902 -0.80902 -0.80902 0.30902 0.00000 0.95106 0.58779 -0.58779 -0.95106 1.00000 -0.80902 0.30902 0.30902 -0.80902 0.00000 0.58779 -0.95106 0.95106 -0.58779

k0guess[direction 0] = (0.053502, -53.088722) pixels Initial guess by findk0() of k0[direction 0] = (-0.042217,0.116336) pixels before fitk0andmodamp In getmodamp: angle=1.850034, mag=0.002273, amp=0.027115, phase=-2.773549 In getmodamp: angle=1.851034, mag=0.002273, amp=0.027115, phase=-2.773578 In getmodamp: angle=1.849033, mag=0.002273, amp=0.027115, phase=-2.773520 In getmodamp: angle=1.848033, mag=0.002273, amp=0.027115, phase=-2.773492 In getmodamp: angle=1.847033, mag=0.002273, amp=0.027115, phase=-2.773463 In getmodamp: angle=1.846033, mag=0.002273, amp=0.027115, phase=-2.773433 In getmodamp: angle=1.845033, mag=0.002273, amp=0.027115, phase=-2.773405 In getmodamp: angle=1.844033, mag=0.002273, amp=0.027115, phase=-2.773375 In getmodamp: angle=1.843033, mag=0.002273, amp=0.027115, phase=-2.773346 In getmodamp: angle=1.842033, mag=0.002273, amp=0.027115, phase=-2.773316 In getmodamp: angle=1.841033, mag=0.002273, amp=0.027115, phase=-2.773286 In getmodamp: angle=1.840033, mag=0.002273, amp=0.027115, phase=-2.773257 In getmodamp: angle=1.839033, mag=0.002273, amp=0.027115, phase=-2.773227 In getmodamp: angle=1.838033, mag=0.002273, amp=0.027115, phase=-2.773197 In getmodamp: angle=1.837033, mag=0.002273, amp=0.027115, phase=-2.773166 In getmodamp: angle=1.836033, mag=0.002273, amp=0.027115, phase=-2.773136 In getmodamp: angle=1.835033, mag=0.002273, amp=0.027115, phase=-2.773106 In getmodamp: angle=1.834033, mag=0.002273, amp=0.027115, phase=-2.773076 In getmodamp: angle=1.833033, mag=0.002273, amp=0.027115, phase=-2.773045 In getmodamp: angle=1.832033, mag=0.002273, amp=0.027115, phase=-2.773015 In getmodamp: angle=1.831033, mag=0.002273, amp=0.027115, phase=-2.772984 In getmodamp: angle=1.830033, mag=0.002273, amp=0.027115, phase=-2.772953 In getmodamp: angle=1.829033, mag=0.002273, amp=0.027115, phase=-2.772922 In getmodamp: angle=1.828032, mag=0.002273, amp=0.027115, phase=-2.772891 In getmodamp: angle=1.827032, mag=0.002273, amp=0.027115, phase=-2.772860 In getmodamp: angle=1.826032, mag=0.002273, amp=0.027115, phase=-2.772828 In getmodamp: angle=1.825032, mag=0.002273, amp=0.027116, phase=-2.772797 In getmodamp: angle=1.824032, mag=0.002273, amp=0.027116, phase=-2.772766 In getmodamp: angle=1.823032, mag=0.002273, amp=0.027116, phase=-2.772734 In getmodamp: angle=1.822032, mag=0.002273, amp=0.027116, phase=-2.772702 In getmodamp: angle=1.821032, mag=0.002273, amp=0.027116, phase=-2.772671 In getmodamp: angle=1.820032, mag=0.002273, amp=0.027116, phase=-2.772638 In getmodamp: angle=1.819032, mag=0.002273, amp=0.027116, phase=-2.772606 In getmodamp: angle=1.818032, mag=0.002273, amp=0.027116, phase=-2.772574 In getmodamp: angle=1.817032, mag=0.002273, amp=0.027116, phase=-2.772542 In getmodamp: angle=1.816032, mag=0.002273, amp=0.027116, phase=-2.772509 In getmodamp: angle=1.815032, mag=0.002273, amp=0.027116, phase=-2.772477 In getmodamp: angle=1.814032, mag=0.002273, amp=0.027116, phase=-2.772444 In getmodamp: angle=1.815199, mag=0.002273, amp=0.027116, phase=-2.772482 In getmodamp: angle=1.815199, mag=0.003757, amp=0.027140, phase=-2.843364 In getmodamp: angle=1.815199, mag=0.005241, amp=0.026993, phase=-2.912066 Optimum modulation amplitude: In getmodamp: angle=1.815199, mag=0.003225, amp=0.027151, phase=-2.818154 Reverse modamp is: amp=1.222096, phase=-2.818154 Combined modamp is: amp=0.028061, phase=-2.818154 Correlation coefficient is: 0.149053 Optimum k0 angle=1.815199, length=0.003225, spacing=310.092732 um In getmodamp: angle=1.815199, mag=0.003225, amp=0.004585, phase=-2.745981 Reverse modamp is: amp=0.752656, phase=-2.745981 Combined modamp is: amp=0.004601, phase=-2.745981 Correlation coefficient is: 0.078054 WARNING: best fit for k0 is 50.157% from expected value. norders=3, zdistcutoff[0]=29 zdistcutoff[1]=30 zdistcutoff[2]=30 moving centerband Before fftplan3d 79523MB free After fftplan 79037MB free re-transforming centerband inserting centerband centerband assembly completed moving order 1 order 1 sideband assembly completed moving order 2 order 2 sideband assembly completed Output: /home/omero/unit/GPUsirecon/unit_proc.tif amin, amax took: 3.947984 s Time point 0, wave 0 done

Old Version

Calling Sim Reconstruction with

cudaSireconDriver --input-file "/tmp/sld_temp_adieowvow_81912903/" --output-file "image_0_Obj_Scan_Single_Channel_Ch1_-_3_P0_T0000_C00.ome" --otf-file "/research/applications/omero/omero_production/ManagedRepository/PSFs/sking2_4/2021-04/01/14-48-06.260/488OTFTHEORY.tif"  --ls 0.501500 --ndirs 1 --bessel --zoomfact 1.500000 --na 1.100000 --nimm 1.330000 --angle0 -1.570000 --otfRA --wiener 0.001000 --background 150.000000 --xyres 0.104000 --zres 0.200000 --besselNA 0.511000 --besselExWave 0.488000 --gammaApo 0.700000 --deskew 32.800000 --nphases 5

wiener=0.001 gamma=0.7 nphases=5, ndirs=1 nx=512, ny=512, nz=85, nz0 = 85, nwaves=1, ntimes=1 nzotf=64, dkzotf=0.156250, nxotf=33, nyotf=1, dkrotf=0.135281 In makematrix. Separation matrix: 1.00000 1.00000 1.00000 1.00000 1.00000 1.00000 0.30902 -0.80902 -0.80902 0.30902 0.00000 0.95106 0.58779 -0.58779 -0.95106 1.00000 -0.80902 0.30902 0.30902 -0.80902 0.00000 0.58779 -0.95106 0.95106 -0.58779

deskew_GPU(): no error intensity_overall=1.117126e-08 ** total memory 11178M; free memory 10597M k0guess[direction 0] = (0.042273, -53.088718) krscale=0.138822 kzscale=0.703243 order2=1, rdistcutoff=221, zdistcutoff=29.000000 makeoverlaps() line 596:no error makeoverlaps() line 662:no error ** total memory 11178M; free memory 10235M makeoverlaps() line 670:no error makeoverlaps() line 677:no error Initial guess by findk0() of k0[direction 0] = (-0.101093,0.080322) before fitk0andmodamp krscale=0.138822 kzscale=0.703243 order2=1, rdistcutoff=221, zdistcutoff=30.000000 makeoverlaps() line 596:no error makeoverlaps() line 662:no error ** total memory 11178M; free memory 10235M makeoverlaps() line 670:no error makeoverlaps() line 677:no error In getmodamp: angle=2.470194, mag=0.129118, amp=0.019913, phase=-2.722648 In getmodamp: angle=2.471194, mag=0.129118, amp=0.019913, phase=-2.722605 In getmodamp: angle=2.469194, mag=0.129118, amp=0.019913, phase=-2.722691 In getmodamp: angle=2.468194, mag=0.129118, amp=0.019913, phase=-2.722734 In getmodamp: angle=2.468791, mag=0.129118, amp=0.019913, phase=-2.722708 In getmodamp: angle=2.468791, mag=0.229118, amp=0.019939, phase=-2.762047 In getmodamp: angle=2.468791, mag=0.329118, amp=0.019948, phase=-2.801625 In getmodamp: angle=2.468791, mag=0.429118, amp=0.019940, phase=-2.841616 Optimum modulation amplitude: In getmodamp: angle=2.468791, mag=0.330796, amp=0.019948, phase=-2.802293 Reverse modamp is: amp=1.275741, phase=-2.802293 Combined modamp is: amp=0.020460, phase=-2.802293 Correlation coefficient is: 0.125044 Optimum k0 angle=2.468791, length=0.330796, spacing=160.969199 microns krscale=0.138822 kzscale=0.703243 order2=2, rdistcutoff=221, zdistcutoff=30.000000 makeoverlaps() line 596:no error makeoverlaps() line 662:no error ** total memory 11178M; free memory 10235M makeoverlaps() line 670:no error makeoverlaps() line 677:no error In getmodamp: angle=2.468791, mag=0.330796, amp=0.003143, phase=-2.781246 Reverse modamp is: amp=0.776559, phase=-2.781246 Combined modamp is: amp=0.003150, phase=-2.781246 Correlation coefficient is: 0.063614 WARNING: best fit for k0 is 53.295715 pixels from expected value. norders=3, zdistcutoff[0]=29 zdistcutoff[1]=30 zdistcutoff[2]=30 moving centerband assemblerealspacebands() line 1863:no error assemblerealspacebands() line 1869:no error re-transforming centerband inserting centerband assemblerealspacebands() line 1897:no error centerband assembly completed moving order 1 assemblerealspacebands() line 1910:no error assemblerealspacebands() line 1928:no error assemblerealspacebands() line 1933:no error order 1 sideband assembly completed moving order 2 assemblerealspacebands() line 1910:no error assemblerealspacebands() line 1928:no error assemblerealspacebands() line 1933:no error order 2 sideband assembly completed assemblerealspacebands() line 1943:no error HERE Output: /tmp/sld_temp_adieowvow_81912903/GPUsirecon/image_0_Obj_Scan_Single_ChannelCh1-_3_P0_T0000_C00.ome_proc.tif amin, amax took: 1.000000 s Time point 0, wave 0 done

linshaova commented 4 weeks ago

Hi @dan-alford (sorry for the late response), these two instances seem to be both example of failure (neither angle is anywhere close to ±1.57, and both amp values are super low (like < 0.05)), and they were applied to different input data it seems like. I'd say this comparison doesn't tell us much. I was hoping to see a comparison where a dataset was successfully processed by your older version and not successful with the latest version. Is that possible to provide?