py4dstem / py4DSTEM

GNU General Public License v3.0
204 stars 141 forks source link

Cuda driver error appeared while reconstruct larger datasets in mixed-state multislice ptychography #691

Open xiongh15 opened 1 week ago

xiongh15 commented 1 week ago

CUDADriverError: CUDA_ERROR_ILLEGAL_ADDRESS: an illegal memory access was encountered Works pretty well on small datasets (256x256x144x144) 65536 diffraction patterns. However, quickly fail on large datasets (400x400x144x144) 160000 diffraction patterns.

The error appeared in py4DSTEM.process.phase.MixedstateMultislicePtychography.preprocess()

To Reproduce

slice_thicknesses=20
ptycho_multislicemultiprobe = py4DSTEM.process.phase.MixedstateMultislicePtychography(
datacube=dataset_ptycho,
    device='gpu',
    num_probes=5,
    num_slices=13,
    slice_thicknesses=slice_thicknesses,
    verbose=True,
    energy=300e3,
    theta_x=0,
    theta_y=0,
    defocus=200,
    semiangle_cutoff=alpha,    
).preprocess(
    #force_com_rotation=0,
    plot_probe_overlaps =False,
)

'CUDADriverError: File cupy_backends/cuda/api/driver.pyx:226, in cupy_backends.cuda.api.driver.moduleLoadData() File cupy_backends/cuda/api/driver.pyx:63, in cupy_backends.cuda.api.driver.check_status() CUDADriverError: CUDA_ERROR_ILLEGAL_ADDRESS: an illegal memory access was encountered'

py4DSTEM version 0.14.18

Python version 3.12.6

Operating system Linux (ubuntu)

GPU

xiongh15 commented 1 week ago

And i am pretty sure there is no memory overflow considering i used quite few probes and slices

gvarnavi commented 1 week ago

Hey @xiongh15, thanks for opening the issue.

And i am pretty sure there is no memory overflow considering i used quite few probes and slices

Emm, not sure how much VRAM your H100 has (96GBs?), but a 400x400x144x144 dataset is a lot to store in-memory!

We have a tutorial notebook on GPU-memory handling within ptychography here, but here are the more important aspects in your case:

Finally, to entertain the idea that this is not a memory overflow issue -- you seem to have a version incompatibility between your cuda driver (12.4) and your cupy version (13.3.0). cc'ing @sezelt who has more experience w/ cuda installations, in case this is indeed the issue.

xiongh15 commented 1 week ago

Thanks for the quick reply! I just finished reading this very helpful notebook. I will try these parameters in the next few days.