warpem / warp

https://warpem.github.io/warp/
GNU General Public License v3.0
29 stars 5 forks source link

cuFFT error during fs_motion_and_ctf #172

Closed mpm896 closed 1 week ago

mpm896 commented 2 weeks ago

Hello,

I've recently installed warptools tools and immediately ran into this error while doing frame series motion correction and CTF estimation. I've pasted the error below. It looks somewhat similar to issue #22 and issue #38, except I haven't set --perdevice (so should be 1?). I'm running on an AWS instance running Ubuntu 20.04 with an Nvidia L4 GPU with 24 Gb memory.

Contents of the log:

2024-07-03 15:30:00.270 Received "LoadStack", with 4 arguments, for GPU #0, 22353 MB free:
2024-07-03 15:30:11.588 Loaded stack: 4096, 4096, 5, normal, real, 1 A/px, ID = -1, 1
2024-07-03 15:30:11.748 Execution took 11.365 seconds
2024-07-03 15:30:11.871
2024-07-03 15:30:12.047 Received "MovieProcessMovement", with 2 arguments, for GPU #0, 22353 MB free:

(Unrelated, but not sure why the log shows 1 A/px when my frameseries.settings specifies the correct pixel size)

Here's the error:

1394 files found
Parsing previous results for each item, if available...
1394/1394, previous metadata found for 1                                                                                                                                   
Connecting to workers...
Connected to 1 workers
0/1394terminate called after throwing an instance of 'std::runtime_error'
  what():  cuFFT error: CUFFT_INTERNAL_ERROR at /usr/share/miniconda/envs/package-build/conda-bld/warp_1718655385920/work/NativeAcceleration/gtom/src/FFT/FFT.cu:23

Failed to process /cloud-data/its-cmo-darwin-magellan-workspaces-folders/WS_Cryoem/CX_LMR/User_directories/Matt/Cryo-ET/PreProcessing_Setup/test_warptools/MM004_g3/100kx/frameseries/Position_4_3_030_45.00_20240626_135413_Fractions.mrc, marked as unselected
Unhandled exception. System.AggregateException: One or more errors occurred. (Cannot assign requested address (localhost:36885))
 ---> System.Net.Http.HttpRequestException: Cannot assign requested address (localhost:36885)
 ---> System.Net.Sockets.SocketException (99): Cannot assign requested address
   at System.Net.Sockets.Socket.AwaitableSocketAsyncEventArgs.ThrowException(SocketError error, CancellationToken cancellationToken)
   at System.Net.Sockets.Socket.AwaitableSocketAsyncEventArgs.System.Threading.Tasks.Sources.IValueTaskSource.GetResult(Int16 token)
   at System.Net.Sockets.Socket.<ConnectAsync>g__WaitForConnectWithCancellation|285_0(AwaitableSocketAsyncEventArgs saea, ValueTask connectTask, CancellationToken cancellationToken)
   at System.Net.Http.HttpConnectionPool.ConnectToTcpHostAsync(String host, Int32 port, HttpRequestMessage initialRequest, Boolean async, CancellationToken cancellationToken)
   --- End of inner exception stack trace ---
   at System.Net.Http.HttpConnectionPool.ConnectToTcpHostAsync(String host, Int32 port, HttpRequestMessage initialRequest, Boolean async, CancellationToken cancellationToken)
   at System.Net.Http.HttpConnectionPool.ConnectAsync(HttpRequestMessage request, Boolean async, CancellationToken cancellationToken)
   at System.Net.Http.HttpConnectionPool.CreateHttp11ConnectionAsync(HttpRequestMessage request, Boolean async, CancellationToken cancellationToken)
   at System.Net.Http.HttpConnectionPool.AddHttp11ConnectionAsync(QueueItem queueItem)
   at System.Threading.Tasks.TaskCompletionSourceWithCancellation`1.WaitWithCancellationAsync(CancellationToken cancellationToken)
   at System.Net.Http.HttpConnectionPool.SendWithVersionDetectionAndRetryAsync(HttpRequestMessage request, Boolean async, Boolean doRequestAuth, CancellationToken cancellationToken)
   at System.Net.Http.RedirectHandler.SendAsync(HttpRequestMessage request, Boolean async, CancellationToken cancellationToken)
   at System.Net.Http.HttpClient.<SendAsync>g__Core|83_0(HttpRequestMessage request, HttpCompletionOption completionOption, CancellationTokenSource cts, Boolean disposeCts, CancellationTokenSource pendingRequestsCts, CancellationToken originalCancellationToken)
   --- End of inner exception stack trace ---
   at System.Threading.Tasks.Task.ThrowIfExceptional(Boolean includeTaskCanceledExceptions)
   at System.Threading.Tasks.Task`1.GetResultCore(Boolean waitCompletionNotification)
   at Warp.WorkerConsole.SetFileOutput(String path) in /usr/share/miniconda/envs/package-build/conda-bld/warp_1718655385920/work/WarpLib/WorkerWrapper.cs:line 541
   at WarpTools.Commands.BaseCommand.<>c__DisplayClass1_0.<IterateOverItems>b__0(Int32 iitem, Int32 threadID) in /usr/share/miniconda/envs/package-build/conda-bld/warp_1718655385920/work/WarpTools/Commands/BaseCommand.cs:line 103
   at Warp.Tools.Helper.ForCPUGreedy(Int32 fromInclusive, Int32 toExclusive, Int32 nThreads, Action`1 funcSetup, Action`2 funcIterator, Action`1 funcTeardown) in /usr/share/miniconda/envs/package-build/conda-bld/warp_1718655385920/work/WarpLib/Tools/Helper.cs:line 786
   at WarpTools.Commands.BaseCommand.IterateOverItems(WorkerWrapper[] workers, BaseOptions cli, Action`2 body, Int32 oversubscribe) in /usr/share/miniconda/envs/package-build/conda-bld/warp_1718655385920/work/WarpTools/Commands/BaseCommand.cs:line 67
   at WarpTools.Commands.MotionCTFFrameseries.Run(Object options) in /usr/share/miniconda/envs/package-build/conda-bld/warp_1718655385920/work/WarpTools/Commands/Frameseries/MotionCTFFrameseries.cs:line 193
   at WarpTools.WarpTools.Run(Object options) in /usr/share/miniconda/envs/package-build/conda-bld/warp_1718655385920/work/WarpTools/Program.cs:line 30
   at Warp.Tools.CommandLineParserHelper.ParseAndRun(String[] args, Func`2 run, Type[] verbs, String appName) in /usr/share/miniconda/envs/package-build/conda-bld/warp_1718655385920/work/WarpLib/Tools/CommandLineParserHelper.cs:line 26
   at WarpTools.WarpTools.Main(String[] args) in /usr/share/miniconda/envs/package-build/conda-bld/warp_1718655385920/work/WarpTools/Program.cs:line 17
   at WarpTools.WarpTools.<Main>(String[] args)
./mc_ctf.sh: line 9: 17795 Aborted                 WarpTools fs_motion_and_ctf --settings warp_frameseries.settings --m_grid 1x1x3 --c_grid 2x2x1 --c_range_max 7 --c_defocus_max 8 --c_use_sum --out_averages --out_average_halves

It also freezes the display of my Terminal tab -- after receiving this error, nothing I type into the terminal shows up, however commands are still executed if I enter them

mpm896 commented 2 weeks ago

It might also be mentioning that this is all running inside a Docker container. I unfortunately don't have a choice regarding this. However I still have the Conda environment setup.

alisterburt commented 1 week ago

Hi @mpm896 - could you confirm that you're running the latest conda build?

Do other programs accessing the GPU work without issues?

kristyrochon commented 1 week ago

I got this error message when trying on a workstation with cuda 12.2 and two 4090s. When I switched to a workstation with two 3090s and cuda 11.6 I had no troubles. I haven't done any troubleshooting to see if it's the cuda or GPU that was the issue.

alisterburt commented 1 week ago

thanks for the input @kristyrochon - although I'm not sure why your CUDA installation would be relevant, the only relevant CUDA installation should be the one in the conda environment.

Perhaps you're manually setting PATH/LD_LIBRARY_PATH in your environment and overriding the CUDA which is available at runtime

mpm896 commented 1 week ago

Hey @alisterburt , I just updated with conda update warp and this issue is still persisting. I've appended /path/to/envs/warp/lib to my LD_LIBRARY_PATH as well in case that was the problem (since I have cudatoolkit-dev also installed in my base conda env), but the error is still occurring.

I was able to get IMOD's alignframes and AlphaFold2 running on the GPU, but I haven't tried much else

mpm896 commented 1 week ago

In case it helps, here are the contents of PATH: /opt/local/AreTomo:/opt/local/AreTomo2:/opt/local/AreTomo3:/usr/local/cuda/bin:/usr/local/Particle/bin:/opt/local/miniconda3/bin:/opt/local/miniconda3/condabin:/usr/cpbin:/usr/local/nvidia/bin:/usr/local/cuda/bin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin:/runs/pipeline-526244/CommonRepo/shell:/usr/local/Particle/bin:/usr/local/IMOD/bin:/usr/cpbin:/usr/local/nvidia/bin:/usr/local/cuda/bin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin:/runs/pipeline-526244/CommonRepo/shell:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin:/usr/games:/usr/local/games:/snap/bin:/usr/local/IMOD/pythonLink

and LD_LIBRARY_PATH: /opt/local/miniconda3/envs/warp/lib:/usr/local/cuda/lib64:/usr/local/ParticleRuntime/R2022b/runtime/glnxa64:/usr/local/ParticleRuntime/R2022b/bin/glnxa64:/usr/local/ParticleRuntime/R2022b/sys/os/glnxa64:/usr/local/ParticleRuntime/R2022b/extern/bin/glnxa64:/usr/local/ParticleRuntime/R2022b/sys/opengl/lib/glnxa64:/usr/local/nvidia/lib:/usr/local/nvidia/lib64

alisterburt commented 1 week ago

@mpm896 you shouldn't have anything installed in your base conda env, see https://stackoverflow.com/questions/57243296/why-is-it-recommended-to-not-install-additional-packages-in-the-conda-base-envir

(it's a stupid design and they have the ability to disable autoactivation)

alisterburt commented 1 week ago

can you please try in a fresh environment and in a shell without a bunch of overrides?

mpm896 commented 1 week ago

@alisterburt got it, still learning the complexities of the proper way to use conda envs. There was a reason I installed cudatoolkit-dev (I don't remember exactly which program prompted me to install it), but I probably should have installed it into a specific env for that.

I set LD_LIBRARY_PATH to "" and re-ran, got the same error

kristyrochon commented 1 week ago

thanks for the input @kristyrochon - although I'm not sure why your CUDA installation would be relevant, the only relevant CUDA installation should be the one in the conda environment.

Perhaps you're manually setting PATH/LD_LIBRARY_PATH in your environment and overriding the CUDA which is available at runtime

Understood. We're using a module system and was loading the same conda environment for both. It looks like warp was installed and built using cuda 11.7 and may need different versions of the environment for the different workstations.

Thanks for the quick reply.

Respectfully, Kristy

alisterburt commented 1 week ago

ah yes, the interplay between conda/modules can be non-trivial for sure

alisterburt commented 1 week ago

@mpm896 an empty LD_LIBRARY_PATH definitely isn't right 🙂

I'd recommend removing anything in your shell configuration and nuking your conda install to be sure you have a clean environment

mpm896 commented 1 week ago

@alisterburt sorry! Maybe I misunderstood what you meant by a fresh shell environment without a bunch of overrides? Some things in the PATH/LD_LIBRARY_PATH are IMOD/ETomo/PEET, AreTomo, etc. Other things I've been trying to figure out, because they were set like this when I was provided with a base Docker image by the company. Can you elaborate on what you mean by "a bunch of overrides"?

In any case, I'll try a fresh conda install and try again soon. I'll let you know from there if I'm still running into this issue!

alisterburt commented 1 week ago

You have a bunch of things on the PATH/LD_LIBRARY_PATH, MATLAB runtimes, CUDA runtimes etc - I would try to start with Warp from a blank slate not a fully loaded complex thing, are you locked in to this docker image in particular?

mpm896 commented 1 week ago

Ok I gotcha. I'm not locked into this one particular docker image, I could start from a clean slate only down to a particular point. The base Ubuntu images provided to us have some preconfigured things, but I'm sure I could work around and adjust some of them

alisterburt commented 1 week ago

Good luck! I'll close as a suspected environment issue but reopen if you have the same issue from a blank slate

mpm896 commented 1 week ago

Hey Alister, I went back to the near-default state of the docker image (only IMOD installed, no PEET or AreTomo installations), reinstalled warp, and I'm still getting the same error with the same log message. I'm pretty thrown off by this part of the error:

Unhandled exception. System.AggregateException: One or more errors occurred. (Cannot assign requested address (localhost:36885))
 ---> System.Net.Http.HttpRequestException: Cannot assign requested address (localhost:36885)
 ---> System.Net.Sockets.SocketException (99): Cannot assign requested address

StackOverflow issues like this one point to this being related to assigning the proper port while working in a docker container. Do you have any experience with this? I've tried assigning different workers with host.docker.internal instead of localhost but they just timeout whenever I run motion correction.

alisterburt commented 1 week ago

Hi @mpm896 - I have no experience with docker so I'm not immediately sure how we should solve this.

Thanks for trying to manually specify the workers, I think you would have to also manually create those worker processes (WarpWorker) for that to work though... this is how I debug the WarpWorker process 🙂

It's weird that your previous docker container didn't have this issue connecting to workers (cuFFT error is a runtime error inside the worker process, implying the worker process was successfully started) - my gut feeling is that there is some config in your previous docker container which makes it not suffer from Cannot assign requested address then your std::runtimeerror is due to the matlab runtimes on your PATH/LD_LIBRARY_PATH

alisterburt commented 1 week ago

or am I misunderstanding, you're still getting the cuFFT error?

mpm896 commented 1 week ago

Yea, I'm still getting a cuFFT error. The default LD_LIBRARY_PATH now is just /usr/local/nvidia/lib:/usr/local/nvidia/lib64, so the matlab runtimes are no longer there. I'm not even sure why IT set it up like this, because /usr/local/nvidia doesn't exist

alisterburt commented 1 week ago

okay, I think the communication issue is a red herring, it's just that the worker has died.

I think /usr/local/nvidia/lib64 might be your problem, what happens if you remove that? edit: doh, if /usr/local/nvidia doesn't exist then /lib64shouldn't either...

Can you compare the output of conda list to mine below too? (fresh install just now, confirmed working)

# packages in environment at /home/burta2/mambaforge/envs/warp:
#
# Name                    Version                   Build  Channel
_libgcc_mutex             0.1                 conda_forge    conda-forge
_openmp_mutex             4.5                  2_kmp_llvm    conda-forge
binutils                  2.39                 hdd6e379_1    conda-forge
binutils_impl_linux-64    2.39                 he00db2b_1    conda-forge
binutils_linux-64         2.39                h5fc0e48_13    conda-forge
blas                      2.121                       mkl    conda-forge
blas-devel                3.9.0            21_linux64_mkl    conda-forge
brotli-python             1.1.0           py311hb755f60_1    conda-forge
bzip2                     1.0.8                hd590300_5    conda-forge
c-ares                    1.28.1               hd590300_0    conda-forge
c-compiler                1.3.0                h7f98852_0    conda-forge
ca-certificates           2024.7.4             hbcca054_0    conda-forge
certifi                   2024.6.2           pyhd8ed1ab_0    conda-forge
cffi                      1.16.0          py311hb3a22ac_0    conda-forge
charset-normalizer        3.3.2              pyhd8ed1ab_0    conda-forge
cmake                     3.29.4               h91dbaaa_0    conda-forge
cuda-cccl                 11.7.58              hc415cf5_0    nvidia/label/cuda-11.7.0
cuda-command-line-tools   11.7.0                        0    nvidia/label/cuda-11.7.0
cuda-compiler             11.7.0                        0    nvidia/label/cuda-11.7.0
cuda-cudart               11.7.60              h9538e0e_0    nvidia/label/cuda-11.7.0
cuda-cudart-dev           11.7.60              h6a7c232_0    nvidia/label/cuda-11.7.0
cuda-cuobjdump            11.7.50              h28cc80a_0    nvidia/label/cuda-11.7.0
cuda-cupti                11.7.50              hb6f9eaf_0    nvidia/label/cuda-11.7.0
cuda-cuxxfilt             11.7.50              hb365495_0    nvidia/label/cuda-11.7.0
cuda-documentation        11.7.50                       0    nvidia/label/cuda-11.7.0
cuda-driver-dev           11.7.60                       0    nvidia/label/cuda-11.7.0
cuda-gdb                  11.7.50              h4a0ac72_0    nvidia/label/cuda-11.7.0
cuda-libraries            11.7.0                        0    nvidia/label/cuda-11.7.0
cuda-libraries-dev        11.7.0                        0    nvidia/label/cuda-11.7.0
cuda-memcheck             11.7.50              hc446b2b_0    nvidia/label/cuda-11.7.0
cuda-nsight               11.7.50                       0    nvidia/label/cuda-11.7.0
cuda-nsight-compute       11.7.0                        0    nvidia/label/cuda-11.7.0
cuda-nvcc                 11.7.64                       0    nvidia/label/cuda-11.7.0
cuda-nvdisasm             11.7.50              h5bd0695_0    nvidia/label/cuda-11.7.0
cuda-nvml-dev             11.7.50              h3af1343_0    nvidia/label/cuda-11.7.0
cuda-nvprof               11.7.50              h7a2404d_0    nvidia/label/cuda-11.7.0
cuda-nvprune              11.7.50              h7add7b4_0    nvidia/label/cuda-11.7.0
cuda-nvrtc                11.7.50              hd0285e0_0    nvidia/label/cuda-11.7.0
cuda-nvrtc-dev            11.7.50              heada363_0    nvidia/label/cuda-11.7.0
cuda-nvtx                 11.7.50              h05b0816_0    nvidia/label/cuda-11.7.0
cuda-nvvp                 11.7.50              hd2289d5_0    nvidia/label/cuda-11.7.0
cuda-runtime              11.7.0                        0    nvidia/label/cuda-11.7.0
cuda-sanitizer-api        11.7.50              hb424887_0    nvidia/label/cuda-11.7.0
cuda-toolkit              11.7.0                        0    nvidia/label/cuda-11.7.0
cuda-tools                11.7.0                        0    nvidia/label/cuda-11.7.0
cuda-visual-tools         11.7.0                        0    nvidia/label/cuda-11.7.0
cxx-compiler              1.3.0                h4bd325d_0    conda-forge
dotnet                    8.0.204              ha770c72_0    conda-forge
dotnet-aspnetcore         8.0.4                hb8a3ed7_0    conda-forge
dotnet-runtime            8.0.4                hb8a3ed7_0    conda-forge
dotnet-sdk                8.0.204              hb8a3ed7_0    conda-forge
ffmpeg                    4.3                  hf484d3e_0    pytorch
fftw                      3.3.10          nompi_hf1063bd_110    conda-forge
filelock                  3.15.4             pyhd8ed1ab_0    conda-forge
freetype                  2.12.1               h267a509_2    conda-forge
gcc                       9.5.0               h1fea6ba_13    conda-forge
gcc_impl_linux-64         9.5.0               h99780fb_19    conda-forge
gcc_linux-64              9.5.0               h4258300_13    conda-forge
gds-tools                 1.3.0.44                      0    nvidia/label/cuda-11.7.0
gmp                       6.3.0                hac33072_2    conda-forge
gmpy2                     2.1.5           py311hc4f1f91_1    conda-forge
gnutls                    3.6.13               h85f3911_1    conda-forge
gxx                       9.5.0               h1fea6ba_13    conda-forge
gxx_impl_linux-64         9.5.0               h99780fb_19    conda-forge
gxx_linux-64              9.5.0               h43f449f_13    conda-forge
h2                        4.1.0              pyhd8ed1ab_0    conda-forge
hpack                     4.0.0              pyh9f0ad1d_0    conda-forge
hyperframe                6.0.1              pyhd8ed1ab_0    conda-forge
icu                       73.2                 h59595ed_0    conda-forge
idna                      3.7                pyhd8ed1ab_0    conda-forge
jinja2                    3.1.4              pyhd8ed1ab_0    conda-forge
jpeg                      9e                   h0b41bf4_3    conda-forge
kernel-headers_linux-64   2.6.32              he073ed8_17    conda-forge
keyutils                  1.6.1                h166bdaf_0    conda-forge
krb5                      1.21.3               h659f571_0    conda-forge
lame                      3.100             h166bdaf_1003    conda-forge
lcms2                     2.15                 hfd0df8a_0    conda-forge
ld_impl_linux-64          2.39                 hcc3a1bd_1    conda-forge
lerc                      4.0.0                h27087fc_0    conda-forge
libblas                   3.9.0            21_linux64_mkl    conda-forge
libcblas                  3.9.0            21_linux64_mkl    conda-forge
libcublas                 11.10.1.25           he442b6f_0    nvidia/label/cuda-11.7.0
libcublas-dev             11.10.1.25           h0c8ac2b_0    nvidia/label/cuda-11.7.0
libcufft                  10.7.2.50            h80a1efe_0    nvidia/label/cuda-11.7.0
libcufft-dev              10.7.2.50            h59a5ac8_0    nvidia/label/cuda-11.7.0
libcufile                 1.3.0.44                      0    nvidia/label/cuda-11.7.0
libcufile-dev             1.3.0.44                      0    nvidia/label/cuda-11.7.0
libcurand                 10.2.10.50           heec50f7_0    nvidia/label/cuda-11.7.0
libcurand-dev             10.2.10.50           hd49a9cd_0    nvidia/label/cuda-11.7.0
libcurl                   8.8.0                hca28451_1    conda-forge
libcusolver               11.3.5.50            hcab339c_0    nvidia/label/cuda-11.7.0
libcusolver-dev           11.3.5.50            hc6eba6f_0    nvidia/label/cuda-11.7.0
libcusparse               11.7.3.50            h6aaafad_0    nvidia/label/cuda-11.7.0
libcusparse-dev           11.7.3.50            hc644b96_0    nvidia/label/cuda-11.7.0
libdeflate                1.17                 h0b41bf4_0    conda-forge
libedit                   3.1.20191231         he28a2e2_2    conda-forge
libev                     4.33                 hd590300_2    conda-forge
libexpat                  2.6.2                h59595ed_0    conda-forge
libffi                    3.4.2                h7f98852_5    conda-forge
libgcc-devel_linux-64     9.5.0               h0a57e50_19    conda-forge
libgcc-ng                 14.1.0               h77fa898_0    conda-forge
libgfortran-ng            14.1.0               h69a702a_0    conda-forge
libgfortran5              14.1.0               hc5f4f2c_0    conda-forge
libgomp                   14.1.0               h77fa898_0    conda-forge
libhwloc                  2.11.0          default_h5622ce7_1000    conda-forge
libiconv                  1.17                 hd590300_2    conda-forge
liblapack                 3.9.0            21_linux64_mkl    conda-forge
liblapacke                3.9.0            21_linux64_mkl    conda-forge
libnghttp2                1.58.0               h47da74e_1    conda-forge
libnpp                    11.7.3.21            h3effbd9_0    nvidia/label/cuda-11.7.0
libnpp-dev                11.7.3.21            hb6476a9_0    nvidia/label/cuda-11.7.0
libnsl                    2.0.1                hd590300_0    conda-forge
libnvjpeg                 11.7.2.34            hfe236c7_0    nvidia/label/cuda-11.7.0
libnvjpeg-dev             11.7.2.34            h2e48410_0    nvidia/label/cuda-11.7.0
libpng                    1.6.43               h2797004_0    conda-forge
libsanitizer              9.5.0               h2f262e1_19    conda-forge
libsqlite                 3.46.0               hde9e2c9_0    conda-forge
libssh2                   1.11.0               h0841786_0    conda-forge
libstdcxx-devel_linux-64  9.5.0               h0a57e50_19    conda-forge
libstdcxx-ng              14.1.0               hc0a3c3a_0    conda-forge
libtiff                   4.5.0                h6adf6a1_2    conda-forge
liburcu                   0.14.0               hac33072_0    conda-forge
libuuid                   2.38.1               h0b41bf4_0    conda-forge
libuv                     1.48.0               hd590300_0    conda-forge
libwebp-base              1.4.0                hd590300_0    conda-forge
libxcb                    1.13              h7f98852_1004    conda-forge
libxcrypt                 4.4.36               hd590300_1    conda-forge
libxml2                   2.12.7               hc051c1a_1    conda-forge
libzlib                   1.2.13               h4ab18f5_6    conda-forge
llvm-openmp               18.1.7               ha31de31_0    conda-forge
lttng-ust                 2.13.8               h4ab18f5_0    conda-forge
markupsafe                2.1.5           py311h459d7ec_0    conda-forge
mkl                       2024.0.0         ha957f24_49657    conda-forge
mkl-devel                 2024.0.0         ha770c72_49657    conda-forge
mkl-include               2024.0.0         ha957f24_49657    conda-forge
mpc                       1.3.1                hfe3b2da_0    conda-forge
mpfr                      4.2.1                h9458935_1    conda-forge
mpmath                    1.3.0              pyhd8ed1ab_0    conda-forge
ncurses                   6.5                  h59595ed_0    conda-forge
nettle                    3.6                  he412f7d_0    conda-forge
networkx                  3.3                pyhd8ed1ab_1    conda-forge
nsight-compute            2022.2.0.13                   0    nvidia/label/cuda-11.7.0
numpy                     2.0.0           py311h1461c94_0    conda-forge
openh264                  2.1.1                h780b84a_0    conda-forge
openjpeg                  2.5.0                hfec8fc6_2    conda-forge
openssl                   3.3.1                h4ab18f5_1    conda-forge
pillow                    9.4.0           py311h50def17_1    conda-forge
pip                       24.0               pyhd8ed1ab_0    conda-forge
pthread-stubs             0.4               h36c2ea0_1001    conda-forge
pycparser                 2.22               pyhd8ed1ab_0    conda-forge
pysocks                   1.7.1              pyha2e5f31_6    conda-forge
python                    3.11.9          hb806964_0_cpython    conda-forge
python_abi                3.11                    4_cp311    conda-forge
pytorch                   2.0.1           py3.11_cuda11.7_cudnn8.5.0_0    pytorch
pytorch-cuda              11.7                 h778d358_5    pytorch
pytorch-mutex             1.0                        cuda    pytorch
readline                  8.2                  h8228510_1    conda-forge
requests                  2.32.3             pyhd8ed1ab_0    conda-forge
rhash                     1.4.4                hd590300_0    conda-forge
setuptools                70.1.1             pyhd8ed1ab_0    conda-forge
sympy                     1.12.1          pypyh2585a3b_103    conda-forge
sysroot_linux-64          2.12                he073ed8_17    conda-forge
tbb                       2021.12.0            h434a139_2    conda-forge
tk                        8.6.13          noxft_h4845f30_101    conda-forge
torchtriton               2.0.0                     py311    pytorch
torchvision               0.15.2              py311_cu117    pytorch
typing_extensions         4.12.2             pyha770c72_0    conda-forge
tzdata                    2024a                h0c530f3_0    conda-forge
urllib3                   2.2.2              pyhd8ed1ab_1    conda-forge
warp                      2.0.0dev18              py311_0    warpem
wheel                     0.43.0             pyhd8ed1ab_1    conda-forge
xorg-libxau               1.0.11               hd590300_0    conda-forge
xorg-libxdmcp             1.1.3                h7f98852_0    conda-forge
xz                        5.2.6                h166bdaf_0    conda-forge
zlib                      1.2.13               h4ab18f5_6    conda-forge
zstandard                 0.22.0          py311hb6f056b_1    conda-forge
zstd                      1.5.6                ha6fb4c9_0    conda-forge
mpm896 commented 1 week ago

Ok so before you asked about this, I tried building from source but got the same cuFFT error. However with the build, my PATH now includes /cloud-home/U1036725/.magellan/conda/envs/warp_build/lib/dotnet:/cloud-home/U1036725/.magellan/conda/envs/warp_build/lib/dotnet/tools.

I've also tried setting the LD_LIBRARY_PATH to include ONLY /cloud-home/U1036725/.magellan/conda/envs/warp_build/lib, but still got the same error.

Here's the output of conda list:

# packages in environment at /cloud-home/U1036725/.magellan/conda/envs/warp_build:
#
# Name                    Version                   Build  Channel
_libgcc_mutex             0.1                 conda_forge    conda-forge
_openmp_mutex             4.5                  2_kmp_llvm    conda-forge
aom                       3.5.0                h27087fc_0    conda-forge
binutils                  2.39                 hdd6e379_1    conda-forge
binutils_impl_linux-64    2.39                 he00db2b_1    conda-forge
binutils_linux-64         2.39                h5fc0e48_13    conda-forge
blas                      2.121                       mkl    conda-forge
blas-devel                3.9.0            21_linux64_mkl    conda-forge
brotli-python             1.1.0           py311hb755f60_1    conda-forge
bzip2                     1.0.8                hd590300_5    conda-forge
c-ares                    1.28.1               hd590300_0    conda-forge
c-compiler                1.3.0                h7f98852_0    conda-forge
ca-certificates           2024.7.4             hbcca054_0    conda-forge
certifi                   2024.6.2           pyhd8ed1ab_0    conda-forge
cffi                      1.16.0          py311hb3a22ac_0    conda-forge
charset-normalizer        3.3.2              pyhd8ed1ab_0    conda-forge
cmake                     3.30.0               hf8c4bd3_0    conda-forge
cuda-cccl                 11.7.58              hc415cf5_0    nvidia/label/cuda-11.7.0
cuda-command-line-tools   11.7.0                        0    nvidia/label/cuda-11.7.0
cuda-compiler             11.7.0                        0    nvidia/label/cuda-11.7.0
cuda-cudart               11.7.60              h9538e0e_0    nvidia/label/cuda-11.7.0
cuda-cudart-dev           11.7.60              h6a7c232_0    nvidia/label/cuda-11.7.0
cuda-cuobjdump            11.7.50              h28cc80a_0    nvidia/label/cuda-11.7.0
cuda-cupti                11.7.50              hb6f9eaf_0    nvidia/label/cuda-11.7.0
cuda-cuxxfilt             11.7.50              hb365495_0    nvidia/label/cuda-11.7.0
cuda-documentation        11.7.50                       0    nvidia/label/cuda-11.7.0
cuda-driver-dev           11.7.60                       0    nvidia/label/cuda-11.7.0
cuda-gdb                  11.7.50              h4a0ac72_0    nvidia/label/cuda-11.7.0
cuda-libraries            11.7.0                        0    nvidia/label/cuda-11.7.0
cuda-libraries-dev        11.7.0                        0    nvidia/label/cuda-11.7.0
cuda-memcheck             11.7.50              hc446b2b_0    nvidia/label/cuda-11.7.0
cuda-nsight               11.7.50                       0    nvidia/label/cuda-11.7.0
cuda-nsight-compute       11.7.0                        0    nvidia/label/cuda-11.7.0
cuda-nvcc                 11.7.64                       0    nvidia/label/cuda-11.7.0
cuda-nvdisasm             11.7.50              h5bd0695_0    nvidia/label/cuda-11.7.0
cuda-nvml-dev             11.7.50              h3af1343_0    nvidia/label/cuda-11.7.0
cuda-nvprof               11.7.50              h7a2404d_0    nvidia/label/cuda-11.7.0
cuda-nvprune              11.7.50              h7add7b4_0    nvidia/label/cuda-11.7.0
cuda-nvrtc                11.7.50              hd0285e0_0    nvidia/label/cuda-11.7.0
cuda-nvrtc-dev            11.7.50              heada363_0    nvidia/label/cuda-11.7.0
cuda-nvtx                 11.7.50              h05b0816_0    nvidia/label/cuda-11.7.0
cuda-nvvp                 11.7.50              hd2289d5_0    nvidia/label/cuda-11.7.0
cuda-runtime              11.7.0                        0    nvidia/label/cuda-11.7.0
cuda-sanitizer-api        11.7.50              hb424887_0    nvidia/label/cuda-11.7.0
cuda-toolkit              11.7.0                        0    nvidia/label/cuda-11.7.0
cuda-tools                11.7.0                        0    nvidia/label/cuda-11.7.0
cuda-visual-tools         11.7.0                        0    nvidia/label/cuda-11.7.0
cxx-compiler              1.3.0                h4bd325d_0    conda-forge
dotnet                    8.0.302              ha770c72_0    conda-forge
dotnet-aspnetcore         8.0.6                h8d34606_0    conda-forge
dotnet-runtime            8.0.6                h8d34606_0    conda-forge
dotnet-sdk                8.0.302              h8d34606_0    conda-forge
expat                     2.6.2                h59595ed_0    conda-forge
ffmpeg                    5.1.2           gpl_h8dda1f0_106    conda-forge
fftw                      3.3.10          nompi_hf1063bd_110    conda-forge
filelock                  3.15.4             pyhd8ed1ab_0    conda-forge
font-ttf-dejavu-sans-mono 2.37                 hab24e00_0    conda-forge
font-ttf-inconsolata      3.000                h77eed37_0    conda-forge
font-ttf-source-code-pro  2.038                h77eed37_0    conda-forge
font-ttf-ubuntu           0.83                 h77eed37_2    conda-forge
fontconfig                2.14.2               h14ed4e7_0    conda-forge
fonts-conda-ecosystem     1                             0    conda-forge
fonts-conda-forge         1                             0    conda-forge
freetype                  2.12.1               h267a509_2    conda-forge
gcc                       9.5.0               h1fea6ba_13    conda-forge
gcc_impl_linux-64         9.5.0               h99780fb_19    conda-forge
gcc_linux-64              9.5.0               h4258300_13    conda-forge
gds-tools                 1.3.0.44                      0    nvidia/label/cuda-11.7.0
gettext                   0.22.5               h59595ed_2    conda-forge
gettext-tools             0.22.5               h59595ed_2    conda-forge
gmp                       6.3.0                hac33072_2    conda-forge
gmpy2                     2.1.5           py311hc4f1f91_1    conda-forge
gnutls                    3.7.9                hb077bed_0    conda-forge
gxx                       9.5.0               h1fea6ba_13    conda-forge
gxx_impl_linux-64         9.5.0               h99780fb_19    conda-forge
gxx_linux-64              9.5.0               h43f449f_13    conda-forge
h2                        4.1.0              pyhd8ed1ab_0    conda-forge
hpack                     4.0.0              pyh9f0ad1d_0    conda-forge
hyperframe                6.0.1              pyhd8ed1ab_0    conda-forge
icu                       73.2                 h59595ed_0    conda-forge
idna                      3.7                pyhd8ed1ab_0    conda-forge
jinja2                    3.1.4              pyhd8ed1ab_0    conda-forge
jpeg                      9e                   h0b41bf4_3    conda-forge
kernel-headers_linux-64   2.6.32              he073ed8_17    conda-forge
keyutils                  1.6.1                h166bdaf_0    conda-forge
krb5                      1.21.3               h659f571_0    conda-forge
lame                      3.100             h166bdaf_1003    conda-forge
lcms2                     2.15                 hfd0df8a_0    conda-forge
ld_impl_linux-64          2.39                 hcc3a1bd_1    conda-forge
lerc                      4.0.0                h27087fc_0    conda-forge
libasprintf               0.22.5               h661eb56_2    conda-forge
libasprintf-devel         0.22.5               h661eb56_2    conda-forge
libblas                   3.9.0            21_linux64_mkl    conda-forge
libcblas                  3.9.0            21_linux64_mkl    conda-forge
libcublas                 11.10.1.25           he442b6f_0    nvidia/label/cuda-11.7.0
libcublas-dev             11.10.1.25           h0c8ac2b_0    nvidia/label/cuda-11.7.0
libcufft                  10.7.2.50            h80a1efe_0    nvidia/label/cuda-11.7.0
libcufft-dev              10.7.2.50            h59a5ac8_0    nvidia/label/cuda-11.7.0
libcufile                 1.3.0.44                      0    nvidia/label/cuda-11.7.0
libcufile-dev             1.3.0.44                      0    nvidia/label/cuda-11.7.0
libcurand                 10.2.10.50           heec50f7_0    nvidia/label/cuda-11.7.0
libcurand-dev             10.2.10.50           hd49a9cd_0    nvidia/label/cuda-11.7.0
libcurl                   8.8.0                hca28451_1    conda-forge
libcusolver               11.3.5.50            hcab339c_0    nvidia/label/cuda-11.7.0
libcusolver-dev           11.3.5.50            hc6eba6f_0    nvidia/label/cuda-11.7.0
libcusparse               11.7.3.50            h6aaafad_0    nvidia/label/cuda-11.7.0
libcusparse-dev           11.7.3.50            hc644b96_0    nvidia/label/cuda-11.7.0
libdeflate                1.17                 h0b41bf4_0    conda-forge
libdrm                    2.4.122              h4ab18f5_0    conda-forge
libedit                   3.1.20191231         he28a2e2_2    conda-forge
libev                     4.33                 hd590300_2    conda-forge
libexpat                  2.6.2                h59595ed_0    conda-forge
libffi                    3.4.2                h7f98852_5    conda-forge
libgcc-devel_linux-64     9.5.0               h0a57e50_19    conda-forge
libgcc-ng                 14.1.0               h77fa898_0    conda-forge
libgettextpo              0.22.5               h59595ed_2    conda-forge
libgettextpo-devel        0.22.5               h59595ed_2    conda-forge
libgfortran-ng            14.1.0               h69a702a_0    conda-forge
libgfortran5              14.1.0               hc5f4f2c_0    conda-forge
libgomp                   14.1.0               h77fa898_0    conda-forge
libhwloc                  2.11.0          default_h5622ce7_1000    conda-forge
libiconv                  1.17                 hd590300_2    conda-forge
libidn2                   2.3.7                hd590300_0    conda-forge
liblapack                 3.9.0            21_linux64_mkl    conda-forge
liblapacke                3.9.0            21_linux64_mkl    conda-forge
libnghttp2                1.58.0               h47da74e_1    conda-forge
libnpp                    11.7.3.21            h3effbd9_0    nvidia/label/cuda-11.7.0
libnpp-dev                11.7.3.21            hb6476a9_0    nvidia/label/cuda-11.7.0
libnsl                    2.0.1                hd590300_0    conda-forge
libnvjpeg                 11.7.2.34            hfe236c7_0    nvidia/label/cuda-11.7.0
libnvjpeg-dev             11.7.2.34            h2e48410_0    nvidia/label/cuda-11.7.0
libopus                   1.3.1                h7f98852_1    conda-forge
libpciaccess              0.18                 hd590300_0    conda-forge
libpng                    1.6.43               h2797004_0    conda-forge
libsanitizer              9.5.0               h2f262e1_19    conda-forge
libsqlite                 3.46.0               hde9e2c9_0    conda-forge
libssh2                   1.11.0               h0841786_0    conda-forge
libstdcxx-devel_linux-64  9.5.0               h0a57e50_19    conda-forge
libstdcxx-ng              14.1.0               hc0a3c3a_0    conda-forge
libtasn1                  4.19.0               h166bdaf_0    conda-forge
libtiff                   4.5.0                h6adf6a1_2    conda-forge
libunistring              0.9.10               h7f98852_0    conda-forge
liburcu                   0.14.0               hac33072_0    conda-forge
libuuid                   2.38.1               h0b41bf4_0    conda-forge
libuv                     1.48.0               hd590300_0    conda-forge
libva                     2.18.0               h0b41bf4_0    conda-forge
libvpx                    1.11.0               h9c3ff4c_3    conda-forge
libwebp-base              1.4.0                hd590300_0    conda-forge
libxcb                    1.13              h7f98852_1004    conda-forge
libxcrypt                 4.4.36               hd590300_1    conda-forge
libxml2                   2.12.7               hc051c1a_1    conda-forge
libzlib                   1.3.1                h4ab18f5_1    conda-forge
llvm-openmp               18.1.8               hf5423f3_0    conda-forge
lttng-ust                 2.13.8               h4ab18f5_0    conda-forge
markupsafe                2.1.5           py311h459d7ec_0    conda-forge
mkl                       2024.0.0         ha957f24_49657    conda-forge
mkl-devel                 2024.0.0         ha770c72_49657    conda-forge
mkl-include               2024.0.0         ha957f24_49657    conda-forge
mpc                       1.3.1                hfe3b2da_0    conda-forge
mpfr                      4.2.1                h9458935_1    conda-forge
mpmath                    1.3.0              pyhd8ed1ab_0    conda-forge
ncurses                   6.5                  h59595ed_0    conda-forge
nettle                    3.9.1                h7ab15ed_0    conda-forge
networkx                  3.3                pyhd8ed1ab_1    conda-forge
nsight-compute            2022.2.0.13                   0    nvidia/label/cuda-11.7.0
numpy                     2.0.0           py311h1461c94_0    conda-forge
openh264                  2.3.1                hcb278e6_2    conda-forge
openjpeg                  2.5.0                hfec8fc6_2    conda-forge
openssl                   3.3.1                h4ab18f5_1    conda-forge
p11-kit                   0.24.1               hc5aa10d_0    conda-forge
pillow                    9.4.0           py311h50def17_1    conda-forge
pip                       24.0               pyhd8ed1ab_0    conda-forge
pthread-stubs             0.4               h36c2ea0_1001    conda-forge
pycparser                 2.22               pyhd8ed1ab_0    conda-forge
pysocks                   1.7.1              pyha2e5f31_6    conda-forge
python                    3.11.9          hb806964_0_cpython    conda-forge
python_abi                3.11                    4_cp311    conda-forge
pytorch                   2.0.1           py3.11_cuda11.7_cudnn8.5.0_0    pytorch
pytorch-cuda              11.7                 h778d358_5    pytorch
pytorch-mutex             1.0                        cuda    pytorch
readline                  8.2                  h8228510_1    conda-forge
requests                  2.32.3             pyhd8ed1ab_0    conda-forge
rhash                     1.4.4                hd590300_0    conda-forge
setuptools                70.1.1             pyhd8ed1ab_0    conda-forge
svt-av1                   1.4.1                hcb278e6_0    conda-forge
sympy                     1.12.1          pypyh2585a3b_103    conda-forge
sysroot_linux-64          2.12                he073ed8_17    conda-forge
tbb                       2021.12.0            h434a139_2    conda-forge
tk                        8.6.13          noxft_h4845f30_101    conda-forge
torchtriton               2.0.0                     py311    pytorch
torchvision               0.15.2              py311_cu117    pytorch
typing_extensions         4.12.2             pyha770c72_0    conda-forge
tzdata                    2024a                h0c530f3_0    conda-forge
urllib3                   2.2.2              pyhd8ed1ab_1    conda-forge
wheel                     0.43.0             pyhd8ed1ab_1    conda-forge
x264                      1!164.3095           h166bdaf_2    conda-forge
x265                      3.5                  h924138e_3    conda-forge
xorg-fixesproto           5.0               h7f98852_1002    conda-forge
xorg-kbproto              1.0.7             h7f98852_1002    conda-forge
xorg-libx11               1.8.4                h0b41bf4_0    conda-forge
xorg-libxau               1.0.11               hd590300_0    conda-forge
xorg-libxdmcp             1.1.3                h7f98852_0    conda-forge
xorg-libxext              1.3.4                h0b41bf4_2    conda-forge
xorg-libxfixes            5.0.3             h7f98852_1004    conda-forge
xorg-xextproto            7.3.0             h0b41bf4_1003    conda-forge
xorg-xproto               7.0.31            h7f98852_1007    conda-forge
xz                        5.2.6                h166bdaf_0    conda-forge
zstandard                 0.22.0          py311hb6f056b_1    conda-forge
zstd                      1.5.6                ha6fb4c9_0    conda-forge

The additional packages in my warp_build env are:

aom
expat
font-ttf-dejavu-sans-mono
font-ttf-inconsolata     
font-ttf-source-code-pro 
font-ttf-ubuntu    
fontconfig           
fonts-conda-ecosystem     
fonts-conda-forge   
gettext   
gettext-tools  
libasprintf    
libasprintf-devel
libdrm 
libgettextpo          
libgettextpo-devel
libidn2
libopus 
libpciaccess 
libtasn1 
libunistring
libva   
libvpx 
p11-kit
svt-av1 
xorg-fixesproto  
xorg-kbproto           
xorg-libx11   
xorg-libxext      
xorg-libxfixes       
xorg-xextproto       
xorg-xproto 

The missing packages in my warp_build env are:

warp
zlib

I'm assuming warp is missing because I build from source instead of installing through conda this time? Let me know what you think

alisterburt commented 1 week ago

Can you please try running the tutorial data with the conda build in a fresh docker image? This way we can ensure it's nothing weird about your data causing the cuFFT error

mpm896 commented 1 week ago

I'll let you know most likely Monday how this goes... wget won't support wildcards on this system, other issues with http and ftp proxies, so I'm downloading the entire dataset locally and transferring the selected frames over to the aws instance, which will take some time. Thanks (as always) for your prompt help!

mpm896 commented 1 week ago

Another thought while I'm waiting to do the tutorial - I collected my data in uncompressed mrc format (don't ask me why!), so rather than one file per fraction, all the fractions for each frame are assembled into one mrc movie (i.e. with 41 tilt angles per tilt series, I have 41 mrc frames files with each containing 5 dose fractions, instead of 5 x 41 = 205 tif files). Could it be that this frame format is unsupported?

alisterburt commented 1 week ago

you could install a more recent wget into your conda env

It's not clear to me how what you explained is different from normal, each image file should contain the same image contents +-some wiggling from stage drift and beam induced motion

It's possible that Warp doesn't support your files properly but I'll wait for you to check that the program runs correctly in your environment on the tutorial data before investigating that

DcShepherd commented 1 week ago

Hello Alister,

I am also having an error with cuFFT on the tutorial dataset.

I noticed that it is looking in the path /usr/share/miniconda/envs/package-build/conda-bld/warp_1720036028036/work/NativeAcceleration/gtom/src/FFT/FFT.cu but I have no such directory.

I checked to see where FFT.cu is actually located and it is actually at /home/doulin/clones/warp/NativeAcceleration/gtom/src/FFT/FFT.cu

Might this be the issue?

mpm896 commented 1 week ago

Hi Alister, just confirming that with a fresh conda build in a fresh docker image, I'm still getting the cuFFT error on the tutorial dataset. Regarding @DcShepherd 's comment, mine also shows an odd path that I think is something to do with how the AWS instance is managing things, but when I also tried building warp (under Build Warp on Linux in the README) I get the same cuFFT error, this time showing the proper path to the FFT.cu file

alisterburt commented 1 week ago

Thanks both for reporting back

@DcShepherd the discrepancy you saw is particularly interesting, where did you see the miniconda path reported?

/usr/share/miniconda/envs/package-build/conda-bld/warp_1720036028036/work/NativeAcceleration/gtom/src/FFT/FFT.cu

Naive question - you are both activating the conda environment, right?

I'm unable to reproduce, have set up fresh installs in both our HPC and on AWS (on top of Ubuntu) without issue. I'm not saying there is no issue only that I'm unsure how to debug further

mpm896 commented 1 week ago

Yea, I've been activating the conda environment. I might just have to ask our IT team to try to set this up and get it running, because I really don't know all that they have setup on these "blank slate" docker images that they provide us with

MinghaoChen-UCB commented 1 week ago

Hi Alister,

I'd like to report the same issue as DcShepherd. The miniconda path was shown in the error message:

Connected to 4 workers 0/183terminate called after throwing an instance of 'std::runtime_error' what(): cuFFT error: CUFFT_INTERNAL_ERROR at /usr/share/miniconda/envs/package-build/conda-bld/warp_1720036028036/work/NativeAcceleration/gtom/src/FFT/FFT.cu:23

and I also found the FFT.cu in my warp directory: /usr/local/warp/NativeAcceleration/gtom/src/FFT/FFT.cu

Is there any way to fix this?

alisterburt commented 1 week ago

Hi @MinghaoChen-UCB and others

Could you please provide the exact commands you're running on the tutorial dataset and also provide the logs from the warp_frameseries directory?

If what you're running into is the same issue then the cause (and thus fix) are not yet understood. I can't reproduce the issue so it's a little difficult to track down...

MinghaoChen-UCB commented 1 week ago

Hi Alister,

I got this error when I'm running the '$WarpTools fs_motion_and_ctf' command. Attached please find the error message and the contents of the warp_frameseries directory. Please note that I'm running warp on my own dataset. I will try the tutorial dataset tomorrow. Thank you,

Minghao warp_frameseries.zip error.txt

mpm896 commented 1 week ago

@DcShepherd @MinghaoChen-UCB I wonder if we're all running on docker images through AWS instances? Maybe that will help narrow down this issue?

DcShepherd commented 1 week ago

@mpm896 I am actually using a standalone workstation for my warp testing

I have attached the log file from my frameseries folder

@alisterburt The commands I am running for the tutorial dataset are:

WarpTools create_settings \
--folder_data frames \
--folder_processing warp_frameseries \
--output warp_frameseries.settings \
--extension "*.tif" \
--angpix 0.7894 \
--gain_path gain_ref.mrc \
--gain_flip_y \
--exposure 2.64

Then I run

WarpTools create_settings \
--output warp_tiltseries.settings \
--folder_processing warp_tiltseries \
--folder_data tomostar \
--extension "*.tomostar" \
--angpix 0.7894 \
--gain_path gain_ref.mrc \
--gain_flip_y \
--exposure 2.64 \
--tomo_dimensions 4400x6000x1000 

This command gives a warning about tomo size...which I don't think is correct

Warning: unbinned tomogram dimensions 4400x6000x1000 appear smaller than expected for 4k+ images. Tomograms should encompass whole field of view.

There is also a warning that tells me that the tomostar directory is not found

Warning: data directory /media/doulin/Secondary_drive/warp_Test2/tomostar not found

Then I run

WarpTools fs_motion_and_ctf \
--settings warp_frameseries.settings \
--m_grid 1x1x3 \
--c_grid 2x2x1 \
--c_range_max 7 \
--c_defocus_max 8 \
--c_use_sum \
--out_averages \
--out_average_halves

The output that I get is

Running command fs_motion_and_ctf with:
m_range_min = 500
m_range_max = 10
m_bfac = -500
m_grid = 1x1x3
c_window = 512
c_range_min = 30
c_range_max = 7
c_defocus_min = 0.5
c_defocus_max = 8
c_voltage = 300
c_cs = 2.7
c_amplitude = 0.07
c_fit_phase = False
c_use_sum = True
c_grid = 2x2x1
out_averages = True
out_average_halves = True
out_skip_first = 0
out_skip_last = 0
device_list = {  }
perdevice = 1
workers = {  }
settings = warp_frameseries.settings
input_data = {  }
input_data_recursive = False
input_processing = null
output_processing = null

No alternative input specified, will use input parameters from warp_frameseries.settings
File search will be relative to /media/doulin/Secondary_drive/warp_Test2/frames
328 files found
Parsing previous results for each item, if available...
328/328, previous metadata found for 1                                                                        
Connecting to workers...
Connected to 1 workers

Connected to 1 workers
0/328terminate called after throwing an instance of 'std::runtime_error'
  what():  cuFFT error: CUFFT_INTERNAL_ERROR at /usr/share/miniconda/envs/package-build/conda-bld/warp_1720036028036/work/NativeAcceleration/gtom/src/FFT/FFT.cu:23

2Dvs3D_53-1700033-32.0_Jul31_21.44.01.log

mpm896 commented 1 week ago

Hi Alister (and all), I've run into a bit of success! I started searching specifically this part of the error on Nvidia forums:

terminate called after throwing an instance of 'std::runtime_error'
  what():  cuFFT error: CUFFT_INTERNAL_ERROR

which it seems like people are getting for a variety of reasons. I saw this post about cuFFT not working on L4 GPUs while it works on T4 GPUs. We have access to instances with A10G GPUs, so I rebooted an instance with this gpu and I'm no longer facing this problem!

I know almost nothing about gpu computing so I have no clue why this would make a difference, but perhaps something needs to be reconfigured for newer GPUs like the L4? I'm also using the same environment that I discussed at the top of this post, with some matlab runtimes (for PEET) in my LD_LIBRARY_PATH, etc. Anyways I hope this helps troubleshoot the issue for others!

MinghaoChen-UCB commented 1 week ago

Thank you for your update! @mpm896 We have four NVIDIA RTX 3090 GPUs on our work station with the cuda version 12.0 and the driver Version: 525.147.05. Unfortunately no alternative machine is available.

MinghaoChen-UCB commented 1 week ago

Hi Alister and all,

I tried the tutorial dataset today. I was able to run the 'fs_motion_and_ctf' command without changing any settings and got to the reconstruction. However, at the 'ts_template_match' step I got stuck again with this cuFFT error. Attached is the error message. I would appreciate your comments. Best, ts_template_match_err.txt

alisterburt commented 1 week ago

@mpm896 I'm glad you've found a solution and this is good to know about - thanks for your patience and great spelunking!

@MinghaoChen-UCB great job getting further, can you check the worker logs in your warp_tiltseries directory for more hints?

alisterburt commented 1 week ago

@MinghaoChen-UCB you also mentioned CUDA 12 but Warp is supposed to use its own CUDA runtime (11.7 I think) pulled in when doing the conda install - please make sure you don't have any additional CUDA runtimes loaded when running Warp

alisterburt commented 1 week ago

@mpm896 closing here and will open a specific issue detailing the problem with L4

@MinghaoChen-UCB please feel free to open another issue for the template matching problems if they don't end up solved with the tips above

alisterburt commented 1 week ago

@DcShepherd same as above for you, feel free to open a new issue if your problem remains unsolved

MinghaoChen-UCB commented 1 week ago

Hi Alister,

I'm glad to inform you that the problem has been solved.

We noticed that our system unexpectedly switched to a wrong conda environment. We solved the problem by reinstalling Warp on the correct conda env and downgrading CUDA to 11.7. Now we successfully got the particle files for Relion. Thank you very much for your prompt reply! Best regards,

alisterburt commented 1 week ago

@MinghaoChen-UCB thanks for reporting back, glad you got it solved!

DcShepherd commented 1 week ago

@mpm896

Thanks for that hint! I swapped over to our 3080 workstation and it runs fine. The error only shows up on the 4000 series workstations.

Thanks for all your help!

alisterburt commented 1 week ago

@DcShepherd thanks for reporting back!