cryoCARE_predict.py Check error on V100 and A100 GPUs

juglab / cryoCARE_pip

PIP package of cryoCARE

BSD 3-Clause "New" or "Revised" License

26 stars 14 forks source link

cryoCARE_predict.py Check error on V100 and A100 GPUs #12

Closed rahelwoldeyes closed 2 years ago

rahelwoldeyes commented 2 years ago

I get "Check failed: narrow == wide (-1305608192 vs. 7284326400)checked narrowing failed; values not equal post-conversion" error when using cryoCARE_predict.py on V100 and A100 GPUs, but works fine on RTX 2080 Ti. Using cuda 11.3, keras-gpu 2.3.1 and tensorflow 2.4.1.

rgsheld commented 2 years ago

Having the same issue. Seems related to the input tomogram size. If you can reduce the size sufficiently (e.g. use imod trimvol) it will run.

rahelwoldeyes commented 2 years ago

Thanks. Normally cryoCARE_predict.py tries different tile size until it finds one that fits into the memory available (see below). In the case I described, the script is not going through this optimization. I have been using RTX 2080s for the prediction step and it works fine although the GPU memory available is significantly smaller (12GB vs 32GB/42GB).

thorstenwagner commented 2 years ago

(Edit: I didnt notice that you are already using cuda 11. Then I don't know either what is going on )

Its probably the cudatoolkit version that leads to that problem.

Could you type

conda list cudatoolkit

I guess you have installed version 10. This version is not compatible with the chipset in V100 and A100.

Try to create a environment with this command:

conda create -n cryocare_c11 -c conda-forge -c anaconda python=3 keras-gpu=2.3.1 cudatoolkit=11

Install cryocare in this env and you should be good to go.

rahelwoldeyes commented 2 years ago

Thanks anyways. I will try a fresh install.

thorstenwagner commented 2 years ago

Hi @rahelwoldeyes

I found out that the cuda 11 instructions had more problems. I've updated them:

https://github.com/thorstenwagner/cryoCARE_mpido#installation

If you want, give it a try :-)

Best, Thorsten

rahelwoldeyes commented 2 years ago

Hi @thorstenwagner,

Thanks for the update! I will try it out.

Best, Rahel

thorstenwagner commented 2 years ago

@rahelwoldeyes After more problems, I've updated them again ^^ Hopefully they now work for you as well. Installing keras-gpu 2.3.1 via conda does not work for the CUDA 11 setup.

rahelwoldeyes commented 2 years ago

Thank you for keeping me updated @thorstenwagner. I appreciate it!