DriverError(CUDA_ERROR_ILLEGAL_ADDRESS, "an illegal memory access was encountered")

I am relatively new so I hope I am not just doing something very stupid :) I am trying to adapt the quantized example for my use case. The inference code is pretty much the same as the example. In general, the code works and I am prompting 2 models on 2 separate GPUs in a loop. After N iterations (N is different every time but in range <100) I encounter the error below. I am running quantized llama-3-8b-instruct from .gguf.

I would appreciate any tips on this topic if the error is on my side. Here is the access to the code.

NOTE: I'm running two A6000 GPUs. This is the nvcc version:

$ nvcc --version
nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2021 NVIDIA Corporation
Built on Thu_Nov_18_09:45:30_PST_2021
Cuda compilation tools, release 11.5, V11.5.119
Build cuda_11.5.r11.5/compiler.30672275_0

thread 'thread '<unnamed><unnamed>' panicked at ' panicked at /home/vake/.cargo/registry/src/index.crates.io-6f17d22bba15001f/cudarc-0.10.0/src/driver/safe/core.rs/home/vake/.cargo/registry/src/index.crates.io-6f17d22bba15001f/cudarc-0.10.0/src/driver/safe/core.rs::208208::7676:
:
called `Result::unwrap()` on an `Err` value: DriverError(CUDA_ERROR_ILLEGAL_ADDRESS, "an illegal memory access was encountered")called `Result::unwrap()` on an `Err` value: DriverError(CUDA_ERROR_ILLEGAL_ADDRESS, "an illegal memory access was encountered")

note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace
thread 'thread '<unnamed><unnamed>' panicked at ' panicked at /home/vake/.cargo/registry/src/index.crates.io-6f17d22bba15001f/cudarc-0.10.0/src/driver/safe/core.rs/home/vake/.cargo/registry/src/index.crates.io-6f17d22bba15001f/cudarc-0.10.0/src/driver/safe/core.rs::208208::7676:
:
called `Result::unwrap()` on an `Err` value: DriverError(CUDA_ERROR_ILLEGAL_ADDRESS, "an illegal memory access was encountered")called `Result::unwrap()` on an `Err` value: DriverError(CUDA_ERROR_ILLEGAL_ADDRESS, "an illegal memory access was encountered")

stack backtrace:
   0:     0x58c00bd19556 - <std::sys_common::backtrace::_print::DisplayBacktrace as core::fmt::Display>::fmt::h410d4c66be4e37f9
   1:     0x58c00bd43550 - core::fmt::write::he40921d4802ce2ac
   2:     0x58c00bd16d4f - std::io::Write::write_fmt::h5de5a4e7037c9b20
   3:     0x58c00bd19334 - std::sys_common::backtrace::print::h11c067a88e3bdb22
   4:     0x58c00bd1abb7 - std::panicking::default_hook::{{closure}}::h8c832ecb03fde8ea
   5:     0x58c00bd1a919 - std::panicking::default_hook::h1633e272b4150cf3
   6:     0x58c00bd1b048 - std::panicking::rust_panic_with_hook::hb164d19c0c1e71d4
   7:     0x58c00bd1af22 - std::panicking::begin_panic_handler::{{closure}}::h0369088c533c20e9
   8:     0x58c00bd19a56 - std::sys_common::backtrace::__rust_end_short_backtrace::hc11d910daf35ac2e
   9:     0x58c00bd1ac74 - rust_begin_unwind
  10:     0x58c00b9113d5 - core::panicking::panic_fmt::ha6effc2775a0749c
  11:     0x58c00b911923 - core::result::unwrap_failed::ha188096f98826595
  12:     0x58c00ba2b6c4 - <cudarc::driver::safe::core::CudaSlice<T> as core::ops::drop::Drop>::drop::h4c289e05ebd51ae6
  13:     0x58c00ba2aafc - core::ptr::drop_in_place<cudarc::driver::safe::core::CudaSlice<f32>>::hcbf6a15615cee068
  14:     0x58c00ba2b1ca - alloc::sync::Arc<T,A>::drop_slow::h994a5bb01f1fc442
  15:     0x58c00ba2af50 - alloc::sync::Arc<T,A>::drop_slow::h4a65dc7109aa30f1
  16:     0x58c00ba1802a - candle_transformers::models::quantized_llama::ModelWeights::forward::had1312fe871968d8
  17:     0x58c00b94121d - llm_bitcoin_inscription_analysis::llm::prompt::prompt_model::hbe917d2214140c60
  18:     0x58c00b96e876 - core::ops::function::impls::<impl core::ops::function::FnMut<A> for &F>::call_mut::h5f9d812f749ee289
  19:     0x58c00b96b756 - rayon::iter::plumbing::Folder::consume_iter::h2c8efde69e0f7383
  20:     0x58c00b971bfc - rayon::iter::plumbing::bridge_producer_consumer::helper::h814a881abff08b3e
  21:     0x58c00b973006 - <rayon_core::job::StackJob<L,F,R> as rayon_core::job::Job>::execute::h8fb2eedfc5ec12fd
  22:     0x58c00b90ce9f - rayon_core::registry::WorkerThread::wait_until_cold::hc0ea83de9f250620
  23:     0x58c00bceaa32 - rayon_core::registry::ThreadBuilder::run::hedc5a5eddbc123f1
  24:     0x58c00bcedbca - std::sys_common::backtrace::__rust_begin_short_backtrace::h14baabb9af848a11
  25:     0x58c00bceeaef - core::ops::function::FnOnce::call_once{{vtable.shim}}::h49599ea7439698c3
  26:     0x58c00bd1fb95 - std::sys::pal::unix::thread::Thread::new::thread_start::h3631815ad38387d6
  27:     0x7b8d4de94ac3 - start_thread
                               at ./nptl/pthread_create.c:442:8
  28:     0x7b8d4df26850 - __GI___clone3
                               at ./misc/../sysdeps/unix/sysv/linux/x86_64/clone3.S:81
  29:                0x0 - <unknown>
stack backtrace:
thread '<unnamed>' panicked at library/core/src/panicking.rs:163 : 5 :
0panic in a destructor during cleanup:
 thread caused non-unwinding panic. aborting.
   0x58c00bd19556 - <std::sys_common::backtraceAborted (core dumped)

huggingface / candle

DriverError(CUDA_ERROR_ILLEGAL_ADDRESS, "an illegal memory access was encountered") #2131