Closed mpm896 closed 1 week ago
It might also be mentioning that this is all running inside a Docker container. I unfortunately don't have a choice regarding this. However I still have the Conda environment setup.
Hi @mpm896 - could you confirm that you're running the latest conda build?
Do other programs accessing the GPU work without issues?
I got this error message when trying on a workstation with cuda 12.2 and two 4090s. When I switched to a workstation with two 3090s and cuda 11.6 I had no troubles. I haven't done any troubleshooting to see if it's the cuda or GPU that was the issue.
thanks for the input @kristyrochon - although I'm not sure why your CUDA installation would be relevant, the only relevant CUDA installation should be the one in the conda environment.
Perhaps you're manually setting PATH/LD_LIBRARY_PATH in your environment and overriding the CUDA which is available at runtime
Hey @alisterburt , I just updated with conda update warp
and this issue is still persisting. I've appended /path/to/envs/warp/lib
to my LD_LIBRARY_PATH
as well in case that was the problem (since I have cudatoolkit-dev
also installed in my base conda env), but the error is still occurring.
I was able to get IMOD's alignframes
and AlphaFold2 running on the GPU, but I haven't tried much else
In case it helps, here are the contents of PATH
:
/opt/local/AreTomo:/opt/local/AreTomo2:/opt/local/AreTomo3:/usr/local/cuda/bin:/usr/local/Particle/bin:/opt/local/miniconda3/bin:/opt/local/miniconda3/condabin:/usr/cpbin:/usr/local/nvidia/bin:/usr/local/cuda/bin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin:/runs/pipeline-526244/CommonRepo/shell:/usr/local/Particle/bin:/usr/local/IMOD/bin:/usr/cpbin:/usr/local/nvidia/bin:/usr/local/cuda/bin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin:/runs/pipeline-526244/CommonRepo/shell:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin:/usr/games:/usr/local/games:/snap/bin:/usr/local/IMOD/pythonLink
and LD_LIBRARY_PATH
:
/opt/local/miniconda3/envs/warp/lib:/usr/local/cuda/lib64:/usr/local/ParticleRuntime/R2022b/runtime/glnxa64:/usr/local/ParticleRuntime/R2022b/bin/glnxa64:/usr/local/ParticleRuntime/R2022b/sys/os/glnxa64:/usr/local/ParticleRuntime/R2022b/extern/bin/glnxa64:/usr/local/ParticleRuntime/R2022b/sys/opengl/lib/glnxa64:/usr/local/nvidia/lib:/usr/local/nvidia/lib64
@mpm896 you shouldn't have anything installed in your base conda env, see https://stackoverflow.com/questions/57243296/why-is-it-recommended-to-not-install-additional-packages-in-the-conda-base-envir
(it's a stupid design and they have the ability to disable autoactivation)
can you please try in a fresh environment and in a shell without a bunch of overrides?
@alisterburt got it, still learning the complexities of the proper way to use conda envs. There was a reason I installed cudatoolkit-dev
(I don't remember exactly which program prompted me to install it), but I probably should have installed it into a specific env for that.
I set LD_LIBRARY_PATH
to "" and re-ran, got the same error
thanks for the input @kristyrochon - although I'm not sure why your CUDA installation would be relevant, the only relevant CUDA installation should be the one in the conda environment.
Perhaps you're manually setting PATH/LD_LIBRARY_PATH in your environment and overriding the CUDA which is available at runtime
Understood. We're using a module system and was loading the same conda environment for both. It looks like warp was installed and built using cuda 11.7 and may need different versions of the environment for the different workstations.
Thanks for the quick reply.
Respectfully, Kristy
ah yes, the interplay between conda/modules can be non-trivial for sure
@mpm896 an empty LD_LIBRARY_PATH definitely isn't right 🙂
I'd recommend removing anything in your shell configuration and nuking your conda install to be sure you have a clean environment
@alisterburt sorry! Maybe I misunderstood what you meant by a fresh shell environment without a bunch of overrides? Some things in the PATH/LD_LIBRARY_PATH are IMOD/ETomo/PEET, AreTomo, etc. Other things I've been trying to figure out, because they were set like this when I was provided with a base Docker image by the company. Can you elaborate on what you mean by "a bunch of overrides"?
In any case, I'll try a fresh conda install and try again soon. I'll let you know from there if I'm still running into this issue!
You have a bunch of things on the PATH/LD_LIBRARY_PATH, MATLAB runtimes, CUDA runtimes etc - I would try to start with Warp from a blank slate not a fully loaded complex thing, are you locked in to this docker image in particular?
Ok I gotcha. I'm not locked into this one particular docker image, I could start from a clean slate only down to a particular point. The base Ubuntu images provided to us have some preconfigured things, but I'm sure I could work around and adjust some of them
Good luck! I'll close as a suspected environment issue but reopen if you have the same issue from a blank slate
Hey Alister, I went back to the near-default state of the docker image (only IMOD
installed, no PEET
or AreTomo
installations), reinstalled warp
, and I'm still getting the same error with the same log message. I'm pretty thrown off by this part of the error:
Unhandled exception. System.AggregateException: One or more errors occurred. (Cannot assign requested address (localhost:36885))
---> System.Net.Http.HttpRequestException: Cannot assign requested address (localhost:36885)
---> System.Net.Sockets.SocketException (99): Cannot assign requested address
StackOverflow issues like this one point to this being related to assigning the proper port while working in a docker container. Do you have any experience with this? I've tried assigning different workers with host.docker.internal
instead of localhost
but they just timeout whenever I run motion correction.
Hi @mpm896 - I have no experience with docker so I'm not immediately sure how we should solve this.
Thanks for trying to manually specify the workers, I think you would have to also manually create those worker processes (WarpWorker
) for that to work though... this is how I debug the WarpWorker process 🙂
It's weird that your previous docker container didn't have this issue connecting to workers (cuFFT error is a runtime error inside the worker process, implying the worker process was successfully started) - my gut feeling is that there is some config in your previous docker container which makes it not suffer from Cannot assign requested address
then your std::runtimeerror
is due to the matlab runtimes on your PATH/LD_LIBRARY_PATH
or am I misunderstanding, you're still getting the cuFFT error?
Yea, I'm still getting a cuFFT error. The default LD_LIBRARY_PATH now is just /usr/local/nvidia/lib:/usr/local/nvidia/lib64
, so the matlab runtimes are no longer there. I'm not even sure why IT set it up like this, because /usr/local/nvidia
doesn't exist
okay, I think the communication issue is a red herring, it's just that the worker has died.
I think /usr/local/nvidia/lib64
might be your problem, what happens if you remove that?
edit: doh, if /usr/local/nvidia
doesn't exist then /lib64
shouldn't either...
Can you compare the output of conda list
to mine below too? (fresh install just now, confirmed working)
# packages in environment at /home/burta2/mambaforge/envs/warp:
#
# Name Version Build Channel
_libgcc_mutex 0.1 conda_forge conda-forge
_openmp_mutex 4.5 2_kmp_llvm conda-forge
binutils 2.39 hdd6e379_1 conda-forge
binutils_impl_linux-64 2.39 he00db2b_1 conda-forge
binutils_linux-64 2.39 h5fc0e48_13 conda-forge
blas 2.121 mkl conda-forge
blas-devel 3.9.0 21_linux64_mkl conda-forge
brotli-python 1.1.0 py311hb755f60_1 conda-forge
bzip2 1.0.8 hd590300_5 conda-forge
c-ares 1.28.1 hd590300_0 conda-forge
c-compiler 1.3.0 h7f98852_0 conda-forge
ca-certificates 2024.7.4 hbcca054_0 conda-forge
certifi 2024.6.2 pyhd8ed1ab_0 conda-forge
cffi 1.16.0 py311hb3a22ac_0 conda-forge
charset-normalizer 3.3.2 pyhd8ed1ab_0 conda-forge
cmake 3.29.4 h91dbaaa_0 conda-forge
cuda-cccl 11.7.58 hc415cf5_0 nvidia/label/cuda-11.7.0
cuda-command-line-tools 11.7.0 0 nvidia/label/cuda-11.7.0
cuda-compiler 11.7.0 0 nvidia/label/cuda-11.7.0
cuda-cudart 11.7.60 h9538e0e_0 nvidia/label/cuda-11.7.0
cuda-cudart-dev 11.7.60 h6a7c232_0 nvidia/label/cuda-11.7.0
cuda-cuobjdump 11.7.50 h28cc80a_0 nvidia/label/cuda-11.7.0
cuda-cupti 11.7.50 hb6f9eaf_0 nvidia/label/cuda-11.7.0
cuda-cuxxfilt 11.7.50 hb365495_0 nvidia/label/cuda-11.7.0
cuda-documentation 11.7.50 0 nvidia/label/cuda-11.7.0
cuda-driver-dev 11.7.60 0 nvidia/label/cuda-11.7.0
cuda-gdb 11.7.50 h4a0ac72_0 nvidia/label/cuda-11.7.0
cuda-libraries 11.7.0 0 nvidia/label/cuda-11.7.0
cuda-libraries-dev 11.7.0 0 nvidia/label/cuda-11.7.0
cuda-memcheck 11.7.50 hc446b2b_0 nvidia/label/cuda-11.7.0
cuda-nsight 11.7.50 0 nvidia/label/cuda-11.7.0
cuda-nsight-compute 11.7.0 0 nvidia/label/cuda-11.7.0
cuda-nvcc 11.7.64 0 nvidia/label/cuda-11.7.0
cuda-nvdisasm 11.7.50 h5bd0695_0 nvidia/label/cuda-11.7.0
cuda-nvml-dev 11.7.50 h3af1343_0 nvidia/label/cuda-11.7.0
cuda-nvprof 11.7.50 h7a2404d_0 nvidia/label/cuda-11.7.0
cuda-nvprune 11.7.50 h7add7b4_0 nvidia/label/cuda-11.7.0
cuda-nvrtc 11.7.50 hd0285e0_0 nvidia/label/cuda-11.7.0
cuda-nvrtc-dev 11.7.50 heada363_0 nvidia/label/cuda-11.7.0
cuda-nvtx 11.7.50 h05b0816_0 nvidia/label/cuda-11.7.0
cuda-nvvp 11.7.50 hd2289d5_0 nvidia/label/cuda-11.7.0
cuda-runtime 11.7.0 0 nvidia/label/cuda-11.7.0
cuda-sanitizer-api 11.7.50 hb424887_0 nvidia/label/cuda-11.7.0
cuda-toolkit 11.7.0 0 nvidia/label/cuda-11.7.0
cuda-tools 11.7.0 0 nvidia/label/cuda-11.7.0
cuda-visual-tools 11.7.0 0 nvidia/label/cuda-11.7.0
cxx-compiler 1.3.0 h4bd325d_0 conda-forge
dotnet 8.0.204 ha770c72_0 conda-forge
dotnet-aspnetcore 8.0.4 hb8a3ed7_0 conda-forge
dotnet-runtime 8.0.4 hb8a3ed7_0 conda-forge
dotnet-sdk 8.0.204 hb8a3ed7_0 conda-forge
ffmpeg 4.3 hf484d3e_0 pytorch
fftw 3.3.10 nompi_hf1063bd_110 conda-forge
filelock 3.15.4 pyhd8ed1ab_0 conda-forge
freetype 2.12.1 h267a509_2 conda-forge
gcc 9.5.0 h1fea6ba_13 conda-forge
gcc_impl_linux-64 9.5.0 h99780fb_19 conda-forge
gcc_linux-64 9.5.0 h4258300_13 conda-forge
gds-tools 1.3.0.44 0 nvidia/label/cuda-11.7.0
gmp 6.3.0 hac33072_2 conda-forge
gmpy2 2.1.5 py311hc4f1f91_1 conda-forge
gnutls 3.6.13 h85f3911_1 conda-forge
gxx 9.5.0 h1fea6ba_13 conda-forge
gxx_impl_linux-64 9.5.0 h99780fb_19 conda-forge
gxx_linux-64 9.5.0 h43f449f_13 conda-forge
h2 4.1.0 pyhd8ed1ab_0 conda-forge
hpack 4.0.0 pyh9f0ad1d_0 conda-forge
hyperframe 6.0.1 pyhd8ed1ab_0 conda-forge
icu 73.2 h59595ed_0 conda-forge
idna 3.7 pyhd8ed1ab_0 conda-forge
jinja2 3.1.4 pyhd8ed1ab_0 conda-forge
jpeg 9e h0b41bf4_3 conda-forge
kernel-headers_linux-64 2.6.32 he073ed8_17 conda-forge
keyutils 1.6.1 h166bdaf_0 conda-forge
krb5 1.21.3 h659f571_0 conda-forge
lame 3.100 h166bdaf_1003 conda-forge
lcms2 2.15 hfd0df8a_0 conda-forge
ld_impl_linux-64 2.39 hcc3a1bd_1 conda-forge
lerc 4.0.0 h27087fc_0 conda-forge
libblas 3.9.0 21_linux64_mkl conda-forge
libcblas 3.9.0 21_linux64_mkl conda-forge
libcublas 11.10.1.25 he442b6f_0 nvidia/label/cuda-11.7.0
libcublas-dev 11.10.1.25 h0c8ac2b_0 nvidia/label/cuda-11.7.0
libcufft 10.7.2.50 h80a1efe_0 nvidia/label/cuda-11.7.0
libcufft-dev 10.7.2.50 h59a5ac8_0 nvidia/label/cuda-11.7.0
libcufile 1.3.0.44 0 nvidia/label/cuda-11.7.0
libcufile-dev 1.3.0.44 0 nvidia/label/cuda-11.7.0
libcurand 10.2.10.50 heec50f7_0 nvidia/label/cuda-11.7.0
libcurand-dev 10.2.10.50 hd49a9cd_0 nvidia/label/cuda-11.7.0
libcurl 8.8.0 hca28451_1 conda-forge
libcusolver 11.3.5.50 hcab339c_0 nvidia/label/cuda-11.7.0
libcusolver-dev 11.3.5.50 hc6eba6f_0 nvidia/label/cuda-11.7.0
libcusparse 11.7.3.50 h6aaafad_0 nvidia/label/cuda-11.7.0
libcusparse-dev 11.7.3.50 hc644b96_0 nvidia/label/cuda-11.7.0
libdeflate 1.17 h0b41bf4_0 conda-forge
libedit 3.1.20191231 he28a2e2_2 conda-forge
libev 4.33 hd590300_2 conda-forge
libexpat 2.6.2 h59595ed_0 conda-forge
libffi 3.4.2 h7f98852_5 conda-forge
libgcc-devel_linux-64 9.5.0 h0a57e50_19 conda-forge
libgcc-ng 14.1.0 h77fa898_0 conda-forge
libgfortran-ng 14.1.0 h69a702a_0 conda-forge
libgfortran5 14.1.0 hc5f4f2c_0 conda-forge
libgomp 14.1.0 h77fa898_0 conda-forge
libhwloc 2.11.0 default_h5622ce7_1000 conda-forge
libiconv 1.17 hd590300_2 conda-forge
liblapack 3.9.0 21_linux64_mkl conda-forge
liblapacke 3.9.0 21_linux64_mkl conda-forge
libnghttp2 1.58.0 h47da74e_1 conda-forge
libnpp 11.7.3.21 h3effbd9_0 nvidia/label/cuda-11.7.0
libnpp-dev 11.7.3.21 hb6476a9_0 nvidia/label/cuda-11.7.0
libnsl 2.0.1 hd590300_0 conda-forge
libnvjpeg 11.7.2.34 hfe236c7_0 nvidia/label/cuda-11.7.0
libnvjpeg-dev 11.7.2.34 h2e48410_0 nvidia/label/cuda-11.7.0
libpng 1.6.43 h2797004_0 conda-forge
libsanitizer 9.5.0 h2f262e1_19 conda-forge
libsqlite 3.46.0 hde9e2c9_0 conda-forge
libssh2 1.11.0 h0841786_0 conda-forge
libstdcxx-devel_linux-64 9.5.0 h0a57e50_19 conda-forge
libstdcxx-ng 14.1.0 hc0a3c3a_0 conda-forge
libtiff 4.5.0 h6adf6a1_2 conda-forge
liburcu 0.14.0 hac33072_0 conda-forge
libuuid 2.38.1 h0b41bf4_0 conda-forge
libuv 1.48.0 hd590300_0 conda-forge
libwebp-base 1.4.0 hd590300_0 conda-forge
libxcb 1.13 h7f98852_1004 conda-forge
libxcrypt 4.4.36 hd590300_1 conda-forge
libxml2 2.12.7 hc051c1a_1 conda-forge
libzlib 1.2.13 h4ab18f5_6 conda-forge
llvm-openmp 18.1.7 ha31de31_0 conda-forge
lttng-ust 2.13.8 h4ab18f5_0 conda-forge
markupsafe 2.1.5 py311h459d7ec_0 conda-forge
mkl 2024.0.0 ha957f24_49657 conda-forge
mkl-devel 2024.0.0 ha770c72_49657 conda-forge
mkl-include 2024.0.0 ha957f24_49657 conda-forge
mpc 1.3.1 hfe3b2da_0 conda-forge
mpfr 4.2.1 h9458935_1 conda-forge
mpmath 1.3.0 pyhd8ed1ab_0 conda-forge
ncurses 6.5 h59595ed_0 conda-forge
nettle 3.6 he412f7d_0 conda-forge
networkx 3.3 pyhd8ed1ab_1 conda-forge
nsight-compute 2022.2.0.13 0 nvidia/label/cuda-11.7.0
numpy 2.0.0 py311h1461c94_0 conda-forge
openh264 2.1.1 h780b84a_0 conda-forge
openjpeg 2.5.0 hfec8fc6_2 conda-forge
openssl 3.3.1 h4ab18f5_1 conda-forge
pillow 9.4.0 py311h50def17_1 conda-forge
pip 24.0 pyhd8ed1ab_0 conda-forge
pthread-stubs 0.4 h36c2ea0_1001 conda-forge
pycparser 2.22 pyhd8ed1ab_0 conda-forge
pysocks 1.7.1 pyha2e5f31_6 conda-forge
python 3.11.9 hb806964_0_cpython conda-forge
python_abi 3.11 4_cp311 conda-forge
pytorch 2.0.1 py3.11_cuda11.7_cudnn8.5.0_0 pytorch
pytorch-cuda 11.7 h778d358_5 pytorch
pytorch-mutex 1.0 cuda pytorch
readline 8.2 h8228510_1 conda-forge
requests 2.32.3 pyhd8ed1ab_0 conda-forge
rhash 1.4.4 hd590300_0 conda-forge
setuptools 70.1.1 pyhd8ed1ab_0 conda-forge
sympy 1.12.1 pypyh2585a3b_103 conda-forge
sysroot_linux-64 2.12 he073ed8_17 conda-forge
tbb 2021.12.0 h434a139_2 conda-forge
tk 8.6.13 noxft_h4845f30_101 conda-forge
torchtriton 2.0.0 py311 pytorch
torchvision 0.15.2 py311_cu117 pytorch
typing_extensions 4.12.2 pyha770c72_0 conda-forge
tzdata 2024a h0c530f3_0 conda-forge
urllib3 2.2.2 pyhd8ed1ab_1 conda-forge
warp 2.0.0dev18 py311_0 warpem
wheel 0.43.0 pyhd8ed1ab_1 conda-forge
xorg-libxau 1.0.11 hd590300_0 conda-forge
xorg-libxdmcp 1.1.3 h7f98852_0 conda-forge
xz 5.2.6 h166bdaf_0 conda-forge
zlib 1.2.13 h4ab18f5_6 conda-forge
zstandard 0.22.0 py311hb6f056b_1 conda-forge
zstd 1.5.6 ha6fb4c9_0 conda-forge
Ok so before you asked about this, I tried building from source but got the same cuFFT error. However with the build, my PATH now includes /cloud-home/U1036725/.magellan/conda/envs/warp_build/lib/dotnet:/cloud-home/U1036725/.magellan/conda/envs/warp_build/lib/dotnet/tools
.
I've also tried setting the LD_LIBRARY_PATH to include ONLY /cloud-home/U1036725/.magellan/conda/envs/warp_build/lib
, but still got the same error.
Here's the output of conda list
:
# packages in environment at /cloud-home/U1036725/.magellan/conda/envs/warp_build:
#
# Name Version Build Channel
_libgcc_mutex 0.1 conda_forge conda-forge
_openmp_mutex 4.5 2_kmp_llvm conda-forge
aom 3.5.0 h27087fc_0 conda-forge
binutils 2.39 hdd6e379_1 conda-forge
binutils_impl_linux-64 2.39 he00db2b_1 conda-forge
binutils_linux-64 2.39 h5fc0e48_13 conda-forge
blas 2.121 mkl conda-forge
blas-devel 3.9.0 21_linux64_mkl conda-forge
brotli-python 1.1.0 py311hb755f60_1 conda-forge
bzip2 1.0.8 hd590300_5 conda-forge
c-ares 1.28.1 hd590300_0 conda-forge
c-compiler 1.3.0 h7f98852_0 conda-forge
ca-certificates 2024.7.4 hbcca054_0 conda-forge
certifi 2024.6.2 pyhd8ed1ab_0 conda-forge
cffi 1.16.0 py311hb3a22ac_0 conda-forge
charset-normalizer 3.3.2 pyhd8ed1ab_0 conda-forge
cmake 3.30.0 hf8c4bd3_0 conda-forge
cuda-cccl 11.7.58 hc415cf5_0 nvidia/label/cuda-11.7.0
cuda-command-line-tools 11.7.0 0 nvidia/label/cuda-11.7.0
cuda-compiler 11.7.0 0 nvidia/label/cuda-11.7.0
cuda-cudart 11.7.60 h9538e0e_0 nvidia/label/cuda-11.7.0
cuda-cudart-dev 11.7.60 h6a7c232_0 nvidia/label/cuda-11.7.0
cuda-cuobjdump 11.7.50 h28cc80a_0 nvidia/label/cuda-11.7.0
cuda-cupti 11.7.50 hb6f9eaf_0 nvidia/label/cuda-11.7.0
cuda-cuxxfilt 11.7.50 hb365495_0 nvidia/label/cuda-11.7.0
cuda-documentation 11.7.50 0 nvidia/label/cuda-11.7.0
cuda-driver-dev 11.7.60 0 nvidia/label/cuda-11.7.0
cuda-gdb 11.7.50 h4a0ac72_0 nvidia/label/cuda-11.7.0
cuda-libraries 11.7.0 0 nvidia/label/cuda-11.7.0
cuda-libraries-dev 11.7.0 0 nvidia/label/cuda-11.7.0
cuda-memcheck 11.7.50 hc446b2b_0 nvidia/label/cuda-11.7.0
cuda-nsight 11.7.50 0 nvidia/label/cuda-11.7.0
cuda-nsight-compute 11.7.0 0 nvidia/label/cuda-11.7.0
cuda-nvcc 11.7.64 0 nvidia/label/cuda-11.7.0
cuda-nvdisasm 11.7.50 h5bd0695_0 nvidia/label/cuda-11.7.0
cuda-nvml-dev 11.7.50 h3af1343_0 nvidia/label/cuda-11.7.0
cuda-nvprof 11.7.50 h7a2404d_0 nvidia/label/cuda-11.7.0
cuda-nvprune 11.7.50 h7add7b4_0 nvidia/label/cuda-11.7.0
cuda-nvrtc 11.7.50 hd0285e0_0 nvidia/label/cuda-11.7.0
cuda-nvrtc-dev 11.7.50 heada363_0 nvidia/label/cuda-11.7.0
cuda-nvtx 11.7.50 h05b0816_0 nvidia/label/cuda-11.7.0
cuda-nvvp 11.7.50 hd2289d5_0 nvidia/label/cuda-11.7.0
cuda-runtime 11.7.0 0 nvidia/label/cuda-11.7.0
cuda-sanitizer-api 11.7.50 hb424887_0 nvidia/label/cuda-11.7.0
cuda-toolkit 11.7.0 0 nvidia/label/cuda-11.7.0
cuda-tools 11.7.0 0 nvidia/label/cuda-11.7.0
cuda-visual-tools 11.7.0 0 nvidia/label/cuda-11.7.0
cxx-compiler 1.3.0 h4bd325d_0 conda-forge
dotnet 8.0.302 ha770c72_0 conda-forge
dotnet-aspnetcore 8.0.6 h8d34606_0 conda-forge
dotnet-runtime 8.0.6 h8d34606_0 conda-forge
dotnet-sdk 8.0.302 h8d34606_0 conda-forge
expat 2.6.2 h59595ed_0 conda-forge
ffmpeg 5.1.2 gpl_h8dda1f0_106 conda-forge
fftw 3.3.10 nompi_hf1063bd_110 conda-forge
filelock 3.15.4 pyhd8ed1ab_0 conda-forge
font-ttf-dejavu-sans-mono 2.37 hab24e00_0 conda-forge
font-ttf-inconsolata 3.000 h77eed37_0 conda-forge
font-ttf-source-code-pro 2.038 h77eed37_0 conda-forge
font-ttf-ubuntu 0.83 h77eed37_2 conda-forge
fontconfig 2.14.2 h14ed4e7_0 conda-forge
fonts-conda-ecosystem 1 0 conda-forge
fonts-conda-forge 1 0 conda-forge
freetype 2.12.1 h267a509_2 conda-forge
gcc 9.5.0 h1fea6ba_13 conda-forge
gcc_impl_linux-64 9.5.0 h99780fb_19 conda-forge
gcc_linux-64 9.5.0 h4258300_13 conda-forge
gds-tools 1.3.0.44 0 nvidia/label/cuda-11.7.0
gettext 0.22.5 h59595ed_2 conda-forge
gettext-tools 0.22.5 h59595ed_2 conda-forge
gmp 6.3.0 hac33072_2 conda-forge
gmpy2 2.1.5 py311hc4f1f91_1 conda-forge
gnutls 3.7.9 hb077bed_0 conda-forge
gxx 9.5.0 h1fea6ba_13 conda-forge
gxx_impl_linux-64 9.5.0 h99780fb_19 conda-forge
gxx_linux-64 9.5.0 h43f449f_13 conda-forge
h2 4.1.0 pyhd8ed1ab_0 conda-forge
hpack 4.0.0 pyh9f0ad1d_0 conda-forge
hyperframe 6.0.1 pyhd8ed1ab_0 conda-forge
icu 73.2 h59595ed_0 conda-forge
idna 3.7 pyhd8ed1ab_0 conda-forge
jinja2 3.1.4 pyhd8ed1ab_0 conda-forge
jpeg 9e h0b41bf4_3 conda-forge
kernel-headers_linux-64 2.6.32 he073ed8_17 conda-forge
keyutils 1.6.1 h166bdaf_0 conda-forge
krb5 1.21.3 h659f571_0 conda-forge
lame 3.100 h166bdaf_1003 conda-forge
lcms2 2.15 hfd0df8a_0 conda-forge
ld_impl_linux-64 2.39 hcc3a1bd_1 conda-forge
lerc 4.0.0 h27087fc_0 conda-forge
libasprintf 0.22.5 h661eb56_2 conda-forge
libasprintf-devel 0.22.5 h661eb56_2 conda-forge
libblas 3.9.0 21_linux64_mkl conda-forge
libcblas 3.9.0 21_linux64_mkl conda-forge
libcublas 11.10.1.25 he442b6f_0 nvidia/label/cuda-11.7.0
libcublas-dev 11.10.1.25 h0c8ac2b_0 nvidia/label/cuda-11.7.0
libcufft 10.7.2.50 h80a1efe_0 nvidia/label/cuda-11.7.0
libcufft-dev 10.7.2.50 h59a5ac8_0 nvidia/label/cuda-11.7.0
libcufile 1.3.0.44 0 nvidia/label/cuda-11.7.0
libcufile-dev 1.3.0.44 0 nvidia/label/cuda-11.7.0
libcurand 10.2.10.50 heec50f7_0 nvidia/label/cuda-11.7.0
libcurand-dev 10.2.10.50 hd49a9cd_0 nvidia/label/cuda-11.7.0
libcurl 8.8.0 hca28451_1 conda-forge
libcusolver 11.3.5.50 hcab339c_0 nvidia/label/cuda-11.7.0
libcusolver-dev 11.3.5.50 hc6eba6f_0 nvidia/label/cuda-11.7.0
libcusparse 11.7.3.50 h6aaafad_0 nvidia/label/cuda-11.7.0
libcusparse-dev 11.7.3.50 hc644b96_0 nvidia/label/cuda-11.7.0
libdeflate 1.17 h0b41bf4_0 conda-forge
libdrm 2.4.122 h4ab18f5_0 conda-forge
libedit 3.1.20191231 he28a2e2_2 conda-forge
libev 4.33 hd590300_2 conda-forge
libexpat 2.6.2 h59595ed_0 conda-forge
libffi 3.4.2 h7f98852_5 conda-forge
libgcc-devel_linux-64 9.5.0 h0a57e50_19 conda-forge
libgcc-ng 14.1.0 h77fa898_0 conda-forge
libgettextpo 0.22.5 h59595ed_2 conda-forge
libgettextpo-devel 0.22.5 h59595ed_2 conda-forge
libgfortran-ng 14.1.0 h69a702a_0 conda-forge
libgfortran5 14.1.0 hc5f4f2c_0 conda-forge
libgomp 14.1.0 h77fa898_0 conda-forge
libhwloc 2.11.0 default_h5622ce7_1000 conda-forge
libiconv 1.17 hd590300_2 conda-forge
libidn2 2.3.7 hd590300_0 conda-forge
liblapack 3.9.0 21_linux64_mkl conda-forge
liblapacke 3.9.0 21_linux64_mkl conda-forge
libnghttp2 1.58.0 h47da74e_1 conda-forge
libnpp 11.7.3.21 h3effbd9_0 nvidia/label/cuda-11.7.0
libnpp-dev 11.7.3.21 hb6476a9_0 nvidia/label/cuda-11.7.0
libnsl 2.0.1 hd590300_0 conda-forge
libnvjpeg 11.7.2.34 hfe236c7_0 nvidia/label/cuda-11.7.0
libnvjpeg-dev 11.7.2.34 h2e48410_0 nvidia/label/cuda-11.7.0
libopus 1.3.1 h7f98852_1 conda-forge
libpciaccess 0.18 hd590300_0 conda-forge
libpng 1.6.43 h2797004_0 conda-forge
libsanitizer 9.5.0 h2f262e1_19 conda-forge
libsqlite 3.46.0 hde9e2c9_0 conda-forge
libssh2 1.11.0 h0841786_0 conda-forge
libstdcxx-devel_linux-64 9.5.0 h0a57e50_19 conda-forge
libstdcxx-ng 14.1.0 hc0a3c3a_0 conda-forge
libtasn1 4.19.0 h166bdaf_0 conda-forge
libtiff 4.5.0 h6adf6a1_2 conda-forge
libunistring 0.9.10 h7f98852_0 conda-forge
liburcu 0.14.0 hac33072_0 conda-forge
libuuid 2.38.1 h0b41bf4_0 conda-forge
libuv 1.48.0 hd590300_0 conda-forge
libva 2.18.0 h0b41bf4_0 conda-forge
libvpx 1.11.0 h9c3ff4c_3 conda-forge
libwebp-base 1.4.0 hd590300_0 conda-forge
libxcb 1.13 h7f98852_1004 conda-forge
libxcrypt 4.4.36 hd590300_1 conda-forge
libxml2 2.12.7 hc051c1a_1 conda-forge
libzlib 1.3.1 h4ab18f5_1 conda-forge
llvm-openmp 18.1.8 hf5423f3_0 conda-forge
lttng-ust 2.13.8 h4ab18f5_0 conda-forge
markupsafe 2.1.5 py311h459d7ec_0 conda-forge
mkl 2024.0.0 ha957f24_49657 conda-forge
mkl-devel 2024.0.0 ha770c72_49657 conda-forge
mkl-include 2024.0.0 ha957f24_49657 conda-forge
mpc 1.3.1 hfe3b2da_0 conda-forge
mpfr 4.2.1 h9458935_1 conda-forge
mpmath 1.3.0 pyhd8ed1ab_0 conda-forge
ncurses 6.5 h59595ed_0 conda-forge
nettle 3.9.1 h7ab15ed_0 conda-forge
networkx 3.3 pyhd8ed1ab_1 conda-forge
nsight-compute 2022.2.0.13 0 nvidia/label/cuda-11.7.0
numpy 2.0.0 py311h1461c94_0 conda-forge
openh264 2.3.1 hcb278e6_2 conda-forge
openjpeg 2.5.0 hfec8fc6_2 conda-forge
openssl 3.3.1 h4ab18f5_1 conda-forge
p11-kit 0.24.1 hc5aa10d_0 conda-forge
pillow 9.4.0 py311h50def17_1 conda-forge
pip 24.0 pyhd8ed1ab_0 conda-forge
pthread-stubs 0.4 h36c2ea0_1001 conda-forge
pycparser 2.22 pyhd8ed1ab_0 conda-forge
pysocks 1.7.1 pyha2e5f31_6 conda-forge
python 3.11.9 hb806964_0_cpython conda-forge
python_abi 3.11 4_cp311 conda-forge
pytorch 2.0.1 py3.11_cuda11.7_cudnn8.5.0_0 pytorch
pytorch-cuda 11.7 h778d358_5 pytorch
pytorch-mutex 1.0 cuda pytorch
readline 8.2 h8228510_1 conda-forge
requests 2.32.3 pyhd8ed1ab_0 conda-forge
rhash 1.4.4 hd590300_0 conda-forge
setuptools 70.1.1 pyhd8ed1ab_0 conda-forge
svt-av1 1.4.1 hcb278e6_0 conda-forge
sympy 1.12.1 pypyh2585a3b_103 conda-forge
sysroot_linux-64 2.12 he073ed8_17 conda-forge
tbb 2021.12.0 h434a139_2 conda-forge
tk 8.6.13 noxft_h4845f30_101 conda-forge
torchtriton 2.0.0 py311 pytorch
torchvision 0.15.2 py311_cu117 pytorch
typing_extensions 4.12.2 pyha770c72_0 conda-forge
tzdata 2024a h0c530f3_0 conda-forge
urllib3 2.2.2 pyhd8ed1ab_1 conda-forge
wheel 0.43.0 pyhd8ed1ab_1 conda-forge
x264 1!164.3095 h166bdaf_2 conda-forge
x265 3.5 h924138e_3 conda-forge
xorg-fixesproto 5.0 h7f98852_1002 conda-forge
xorg-kbproto 1.0.7 h7f98852_1002 conda-forge
xorg-libx11 1.8.4 h0b41bf4_0 conda-forge
xorg-libxau 1.0.11 hd590300_0 conda-forge
xorg-libxdmcp 1.1.3 h7f98852_0 conda-forge
xorg-libxext 1.3.4 h0b41bf4_2 conda-forge
xorg-libxfixes 5.0.3 h7f98852_1004 conda-forge
xorg-xextproto 7.3.0 h0b41bf4_1003 conda-forge
xorg-xproto 7.0.31 h7f98852_1007 conda-forge
xz 5.2.6 h166bdaf_0 conda-forge
zstandard 0.22.0 py311hb6f056b_1 conda-forge
zstd 1.5.6 ha6fb4c9_0 conda-forge
The additional packages in my warp_build env are:
aom
expat
font-ttf-dejavu-sans-mono
font-ttf-inconsolata
font-ttf-source-code-pro
font-ttf-ubuntu
fontconfig
fonts-conda-ecosystem
fonts-conda-forge
gettext
gettext-tools
libasprintf
libasprintf-devel
libdrm
libgettextpo
libgettextpo-devel
libidn2
libopus
libpciaccess
libtasn1
libunistring
libva
libvpx
p11-kit
svt-av1
xorg-fixesproto
xorg-kbproto
xorg-libx11
xorg-libxext
xorg-libxfixes
xorg-xextproto
xorg-xproto
The missing packages in my warp_build env are:
warp
zlib
I'm assuming warp
is missing because I build from source instead of installing through conda this time? Let me know what you think
Can you please try running the tutorial data with the conda build in a fresh docker image? This way we can ensure it's nothing weird about your data causing the cuFFT error
I'll let you know most likely Monday how this goes... wget
won't support wildcards on this system, other issues with http and ftp proxies, so I'm downloading the entire dataset locally and transferring the selected frames over to the aws instance, which will take some time. Thanks (as always) for your prompt help!
Another thought while I'm waiting to do the tutorial - I collected my data in uncompressed mrc format (don't ask me why!), so rather than one file per fraction, all the fractions for each frame are assembled into one mrc movie (i.e. with 41 tilt angles per tilt series, I have 41 mrc frames files with each containing 5 dose fractions, instead of 5 x 41 = 205 tif files). Could it be that this frame format is unsupported?
you could install a more recent wget
into your conda env
It's not clear to me how what you explained is different from normal, each image file should contain the same image contents +-some wiggling from stage drift and beam induced motion
It's possible that Warp doesn't support your files properly but I'll wait for you to check that the program runs correctly in your environment on the tutorial data before investigating that
Hello Alister,
I am also having an error with cuFFT on the tutorial dataset.
I noticed that it is looking in the path /usr/share/miniconda/envs/package-build/conda-bld/warp_1720036028036/work/NativeAcceleration/gtom/src/FFT/FFT.cu
but I have no such directory.
I checked to see where FFT.cu is actually located and it is actually at /home/doulin/clones/warp/NativeAcceleration/gtom/src/FFT/FFT.cu
Might this be the issue?
Hi Alister, just confirming that with a fresh conda build in a fresh docker image, I'm still getting the cuFFT error on the tutorial dataset. Regarding @DcShepherd 's comment, mine also shows an odd path that I think is something to do with how the AWS instance is managing things, but when I also tried building warp (under Build Warp on Linux
in the README
) I get the same cuFFT error, this time showing the proper path to the FFT.cu
file
Thanks both for reporting back
@DcShepherd the discrepancy you saw is particularly interesting, where did you see the miniconda path reported?
/usr/share/miniconda/envs/package-build/conda-bld/warp_1720036028036/work/NativeAcceleration/gtom/src/FFT/FFT.cu
Naive question - you are both activating the conda environment, right?
I'm unable to reproduce, have set up fresh installs in both our HPC and on AWS (on top of Ubuntu) without issue. I'm not saying there is no issue only that I'm unsure how to debug further
Yea, I've been activating the conda environment. I might just have to ask our IT team to try to set this up and get it running, because I really don't know all that they have setup on these "blank slate" docker images that they provide us with
Hi Alister,
I'd like to report the same issue as DcShepherd. The miniconda path was shown in the error message:
Connected to 4 workers 0/183terminate called after throwing an instance of 'std::runtime_error' what(): cuFFT error: CUFFT_INTERNAL_ERROR at /usr/share/miniconda/envs/package-build/conda-bld/warp_1720036028036/work/NativeAcceleration/gtom/src/FFT/FFT.cu:23
and I also found the FFT.cu in my warp directory: /usr/local/warp/NativeAcceleration/gtom/src/FFT/FFT.cu
Is there any way to fix this?
Hi @MinghaoChen-UCB and others
Could you please provide the exact commands you're running on the tutorial dataset and also provide the logs from the warp_frameseries
directory?
If what you're running into is the same issue then the cause (and thus fix) are not yet understood. I can't reproduce the issue so it's a little difficult to track down...
Hi Alister,
I got this error when I'm running the '$WarpTools fs_motion_and_ctf' command. Attached please find the error message and the contents of the warp_frameseries directory. Please note that I'm running warp on my own dataset. I will try the tutorial dataset tomorrow. Thank you,
Minghao warp_frameseries.zip error.txt
@DcShepherd @MinghaoChen-UCB I wonder if we're all running on docker images through AWS instances? Maybe that will help narrow down this issue?
@mpm896 I am actually using a standalone workstation for my warp testing
I have attached the log file from my frameseries folder
@alisterburt The commands I am running for the tutorial dataset are:
WarpTools create_settings \
--folder_data frames \
--folder_processing warp_frameseries \
--output warp_frameseries.settings \
--extension "*.tif" \
--angpix 0.7894 \
--gain_path gain_ref.mrc \
--gain_flip_y \
--exposure 2.64
Then I run
WarpTools create_settings \
--output warp_tiltseries.settings \
--folder_processing warp_tiltseries \
--folder_data tomostar \
--extension "*.tomostar" \
--angpix 0.7894 \
--gain_path gain_ref.mrc \
--gain_flip_y \
--exposure 2.64 \
--tomo_dimensions 4400x6000x1000
This command gives a warning about tomo size...which I don't think is correct
Warning: unbinned tomogram dimensions 4400x6000x1000 appear smaller than expected for 4k+ images. Tomograms should encompass whole field of view.
There is also a warning that tells me that the tomostar directory is not found
Warning: data directory /media/doulin/Secondary_drive/warp_Test2/tomostar not found
Then I run
WarpTools fs_motion_and_ctf \
--settings warp_frameseries.settings \
--m_grid 1x1x3 \
--c_grid 2x2x1 \
--c_range_max 7 \
--c_defocus_max 8 \
--c_use_sum \
--out_averages \
--out_average_halves
The output that I get is
Running command fs_motion_and_ctf with:
m_range_min = 500
m_range_max = 10
m_bfac = -500
m_grid = 1x1x3
c_window = 512
c_range_min = 30
c_range_max = 7
c_defocus_min = 0.5
c_defocus_max = 8
c_voltage = 300
c_cs = 2.7
c_amplitude = 0.07
c_fit_phase = False
c_use_sum = True
c_grid = 2x2x1
out_averages = True
out_average_halves = True
out_skip_first = 0
out_skip_last = 0
device_list = { }
perdevice = 1
workers = { }
settings = warp_frameseries.settings
input_data = { }
input_data_recursive = False
input_processing = null
output_processing = null
No alternative input specified, will use input parameters from warp_frameseries.settings
File search will be relative to /media/doulin/Secondary_drive/warp_Test2/frames
328 files found
Parsing previous results for each item, if available...
328/328, previous metadata found for 1
Connecting to workers...
Connected to 1 workers
Connected to 1 workers
0/328terminate called after throwing an instance of 'std::runtime_error'
what(): cuFFT error: CUFFT_INTERNAL_ERROR at /usr/share/miniconda/envs/package-build/conda-bld/warp_1720036028036/work/NativeAcceleration/gtom/src/FFT/FFT.cu:23
Hi Alister (and all), I've run into a bit of success! I started searching specifically this part of the error on Nvidia forums:
terminate called after throwing an instance of 'std::runtime_error'
what(): cuFFT error: CUFFT_INTERNAL_ERROR
which it seems like people are getting for a variety of reasons. I saw this post about cuFFT not working on L4 GPUs while it works on T4 GPUs. We have access to instances with A10G GPUs, so I rebooted an instance with this gpu and I'm no longer facing this problem!
I know almost nothing about gpu computing so I have no clue why this would make a difference, but perhaps something needs to be reconfigured for newer GPUs like the L4? I'm also using the same environment that I discussed at the top of this post, with some matlab runtimes (for PEET) in my LD_LIBRARY_PATH, etc. Anyways I hope this helps troubleshoot the issue for others!
Thank you for your update! @mpm896 We have four NVIDIA RTX 3090 GPUs on our work station with the cuda version 12.0 and the driver Version: 525.147.05. Unfortunately no alternative machine is available.
Hi Alister and all,
I tried the tutorial dataset today. I was able to run the 'fs_motion_and_ctf' command without changing any settings and got to the reconstruction. However, at the 'ts_template_match' step I got stuck again with this cuFFT error. Attached is the error message. I would appreciate your comments. Best, ts_template_match_err.txt
@mpm896 I'm glad you've found a solution and this is good to know about - thanks for your patience and great spelunking!
@MinghaoChen-UCB great job getting further, can you check the worker logs in your warp_tiltseries
directory for more hints?
@MinghaoChen-UCB you also mentioned CUDA 12 but Warp is supposed to use its own CUDA runtime (11.7 I think) pulled in when doing the conda install - please make sure you don't have any additional CUDA runtimes loaded when running Warp
@mpm896 closing here and will open a specific issue detailing the problem with L4
@MinghaoChen-UCB please feel free to open another issue for the template matching problems if they don't end up solved with the tips above
@DcShepherd same as above for you, feel free to open a new issue if your problem remains unsolved
Hi Alister,
I'm glad to inform you that the problem has been solved.
We noticed that our system unexpectedly switched to a wrong conda environment. We solved the problem by reinstalling Warp on the correct conda env and downgrading CUDA to 11.7. Now we successfully got the particle files for Relion. Thank you very much for your prompt reply! Best regards,
@MinghaoChen-UCB thanks for reporting back, glad you got it solved!
@mpm896
Thanks for that hint! I swapped over to our 3080 workstation and it runs fine. The error only shows up on the 4000 series workstations.
Thanks for all your help!
@DcShepherd thanks for reporting back!
Hello,
I've recently installed
warptools
tools and immediately ran into this error while doing frame series motion correction and CTF estimation. I've pasted the error below. It looks somewhat similar to issue #22 and issue #38, except I haven't set--perdevice
(so should be 1?). I'm running on an AWS instance running Ubuntu 20.04 with an Nvidia L4 GPU with 24 Gb memory.Contents of the log:
(Unrelated, but not sure why the log shows 1 A/px when my frameseries.settings specifies the correct pixel size)
Here's the error:
It also freezes the display of my Terminal tab -- after receiving this error, nothing I type into the terminal shows up, however commands are still executed if I enter them