Getting the pycuda branch running

wfeschen commented 3 years ago

Hi,

I am currently working with quite large diffraction patterns (2048x2048). Since the reconstruction time is quite long, I would like to speed up the reconstructions. I figured out that there is a pycuda branch. Unfortunately, I was not really able to get this branch running. I am wondering if someone could maybe help me with getting it running (if the code is currently intended for other users). We installed cuda (9.0) and pycuda on our ubuntu server. When I try to run an example script in the templates folder:

python templates/minimal_prep_and_run_DM_pycuda.py

I get the following error.

[command: nvcc --cubin -O3 -DNDEBUG -std=c++11 -lineinfo -arch sm_70 -I/home/wilhelm/myvenv2/lib/python3.6/site-packages/numpy/core/include -I/home/wilhelm/myvenv2/lib/python3.6/site-packages/pycuda-2019.1.2-py3.6-linux-x86_64.egg/pycuda/cuda kernel.cu]
[stderr:
kernel.cu(10): error: __shared__ variables cannot have external linkage

1 error detected in the compilation of "/tmp/tmpxft_00000edc_00000000-6_kernel.cpp1.ii".
]

I am here a bit clueless since the pycuda environment works for other code. I also tried to install the packages using conda and the full_dependencies.yml, but it failed with the same error. I assume, that we might need a different CUDA version?

I am appreciating any help.

Best regards, Wilhelm

bjoernenders commented 3 years ago

Hi Wilhelm, yes for this experimental branch, full_dependencies.yml is the right set of requirements. To me it looks like pycuda is having issues trying to compile one of our raw kernels. I have not once seen this error before. I looked inside our cuda code and we do have extern shared constructs but this is a recommended way for shared memory access. I don't know if a more recent CUDA version helps, cause this is kinda old cuda stuff. But it's definitely worth a try. Do you know at which kernel/module it crashed? And maybe provide a printout of all the python packages in your env. Thanks. Bjoern

wfeschen commented 3 years ago

Hi Björn,

thank you for your quick response. I reinstalled the pycuda branch, since I was messing around with the python code. After reinstallation I ended up with a different error.

Couldnt find hdf5plugin - better hope your h5py has bitshuffle!
Could not import experiment NanomaxZmqScan from .nanomax_streaming, Reason: No module named 'bitshuffle'
                                                   frames_per_block     INVALID
engines.engine00                                   symlink              INVALID
engines.engine00                                   name                 UNKNOWN
Traceback (most recent call last):
  File "minimal_prep_and_run_DM_pycuda.py", line 49, in <module>
    P = Ptycho(p,level=5)
  File "/home/wilhelm/anaconda3/envs/full_dependencies/lib/python3.7/site-packages/ptypy/core/ptycho.py", line 311, in __init__
    defaults_tree['ptycho'].validate(self.p)
  File "/home/wilhelm/anaconda3/envs/full_dependencies/lib/python3.7/site-packages/ptypy/utils/descriptor.py", line 992, in validate
    raise RuntimeError('Parameter validation failed:\n  ' + '\n  '.join(raise_reasons))
RuntimeError: Parameter validation failed:
   - frames_per_block
  engines.engine00 - make sure to specify the .name field

I get the same error using the full_dependencies.yml anaconda enviroment and our other pycuda enviroment, which leads me to the conclusion that our problem is indeed related to the cuda version.

pip freez shows the following installed libraries, which looks for me fine.

appdirs==1.4.4
attrs==20.2.0
certifi==2020.6.20
chardet==3.0.4
coverage==5.3
coveralls==2.1.2
cppimport==20.8.4.2
cycler==0.10.0
Cython==0.29.21
decorator==4.4.2
docopt==0.6.2
fabio==0.10.2
funcsigs==1.0.2
h5py @ file:///home/conda/feedstock_root/build_artifacts/h5py_1602551881630/work
idna==2.10
importlib-metadata==2.0.0
iniconfig==1.1.1
kiwisolver @ file:///home/conda/feedstock_root/build_artifacts/kiwisolver_1603883001809/work
Mako==1.1.3
MarkupSafe==1.1.1
matplotlib @ file:///home/conda/feedstock_root/build_artifacts/matplotlib-suite_1602600750896/work
mpi4py @ file:///home/conda/feedstock_root/build_artifacts/mpi4py_1602248456507/work
numpy @ file:///home/conda/feedstock_root/build_artifacts/numpy_1603047558839/work
olefile @ file:///home/conda/feedstock_root/build_artifacts/olefile_1602866521163/work
packaging==20.4
pep8==1.7.1
Pillow @ file:///home/conda/feedstock_root/build_artifacts/pillow_1603406595057/work
pluggy==0.13.1
py==1.9.0
pybind11==2.6.0
pycuda==2020.1
pyFFTW @ file:///home/conda/feedstock_root/build_artifacts/pyfftw_1602504439006/work
pyopencl==2020.1
pyparsing==2.4.7
PyQt5==5.12.3
PyQt5-sip==4.19.18
PyQtChart==5.12
PyQtWebEngine==5.12.1
pytest==6.1.2
pytest-cov==2.10.1
python-dateutil==2.8.1
Python-Ptychography-toolbox==0.4.0
pytools==2020.4.3
pyzmq==19.0.2
reikna==0.7.5
requests==2.24.0
scikit-cuda==0.5.3
scipy @ file:///home/conda/feedstock_root/build_artifacts/scipy_1603636231216/work
six @ file:///home/conda/feedstock_root/build_artifacts/six_1590081179328/work
toml==0.10.1
tornado @ file:///home/conda/feedstock_root/build_artifacts/tornado_1602488893411/work
urllib3==1.25.11
zipp==3.4.0

Did someone experience the same error before or has an idea why my pycuda version is failing? Or could someone give me information about the cuda version which was used for development so I can try a different cuda version?

Thanks, Wilhelm

bjoernenders commented 3 years ago

Hi Wilhelm, the error you get relates to your parameters, the validator complains and has nothing to do with pycuda. Could you post the Python script? Our tests for pycuda run with cuda 10.2, we do not test with other (lower) cuda versions. Best Bjoern

wfeschen commented 3 years ago

Hi Björn,

yes, I used a script which I found in the templates folder (https://github.com/ptycho/ptypy/blob/pycuda/templates/minimal_prep_and_run_DM_pycuda.py) and run it with

python minimal_prep_and_run_DM_pycuda.py

Thanks Wilhelm

bjoernenders commented 3 years ago

It could be that the pycuda engine failed to load. Could you try an alternate engine and (if that works) try explicitly loading the engine in the script with from ptypy.engines import DM_pycuda

daurer commented 3 years ago

Hi Wilhelm,

Similar to what Bjoern suggested, are you sure that your ptypy installed in /home/wilhelm/anaconda3/envs/full_dependencies/lib/python3.7/site-packages/ptypy/ is from the pycuda branch and has all the new GPU engines (e.g. DM_pycuda)?

I have just installed the latest pycuda branch and successfully ran the minimal_prep_and_run_DM_pycuda.py script. I am using CUDA 10.1

daurer commented 3 years ago

@wfeschen just curious, did you get the pycuda engine to run on your system?

wfeschen commented 3 years ago

Hi Benedikt and Björn,

thank you for your help. I got the pycuda engine running by installing CUDA 10. I think the new error I reported was somehow related to some lines in the engines/__init__.py file:

try:
    from . import DM_pycuda
    from . import DM_pycuda_streams
    from . import ML_pycuda
    from . import DM_pycuda_stream
except:
    pass

I could get pycuda only running on my own user in the environment I build by myself, but not in the anaconda environment, where I still had problems with the cuda library. So I created a new user to test the anaconda environment from a clean state with CUDA 10 and there it was instantly working.

Anyway, I got the new version running and in a first test (70x1024x1024 diffraction patterns), I experienced a speed improvement by a factor of 30! This really makes a difference and reduces the reconstruction time from hours to minutes.

I ran the same experimental data (siemensstar) using the CPU engine ("DM" + "ML")

CPU

and once with the GPU engine ("DM_pycuda" + "ML_pycuda", same set of parameters) and the results show only a minor difference. GPU

So, thanks a lot for this implementation, it is really helpful. There is only one remaining question, I figured out that there is this parameter called "frames_per_block" and I am not really sure how to set this one. For the reconstructions I did I used p.frames_per_block = 100, but this was just a value that somehow worked. Is there some rule of thumb how to set this value?

Thanks for the help Wilhelm

daurer commented 3 years ago

Hi Wilhelm,

Happy to see that it is working for you now. And thanks for the feedback regarding the anaconda environment. I will do some more testing to make sure that the provided dependencies work on a clean system.

Regarding your question about the new parameter frames_per_block. It defines the maximum number of frames (including associated exit wave arrays, etc... ) that will be loaded into GPU memory. If you want to avoid unnecessary memory to be created, you should set this value to not more than the actual nr. of frames in your data. And on the higher end of things, something like 500 or 1000 works for me on P100 or V100 cards as long as I stay <= 5 probe modes.

Also, if your data becomes large, you should probably start using one of the Block scan models (BlockFull or BlockVanilla) and for DM you might want to switch over to DM_pycuda_streams.

All of this will be properly documented in the next ptypy release and some things might still change compared to what you currently see in the pycuda branch. We are planning to merge all the GPU work into master by the end of this year.

Benedikt

daurer commented 3 years ago

And unless there are any further questions, feel free to close this issue.

wfeschen commented 3 years ago

Hi Benedikt,

Thank you for your quick response. I think a documentation will be really helpful and I am looking forward for the upcoming update.

Wilhelm

ptycho / ptypy

Getting the pycuda branch running #272