CUDA error: device-side assert triggered when trying to execute example scripts

Describe the bug I successfully installed the program and it passed test/cpp/end_to_end, then when I tried to execute examples/training/scripts/fb15k_gpu.sh (and also some other configs with GPU enabled), it triggered a nll_loss_backward_reduce_cuda_kernel_2d assertion failure.

To Reproduce Steps to reproduce the behavior:

I execute bash examples/training/scripts/fb15k_gpu.sh
marius_preprocess step is able to be executed without any problems

When marius_train proceeds to backward for the first batch of the first epoch, the following error occurs:

nfp@node19:~/marius$ bash examples/training/scripts/fb15k_gpu.sh 
fb15k
Downloading fb15k.tgz to output_dir/fb15k.tgz
Extracting
Extraction completed
Detected delimiter: ~   ~
Reading in output_dir/freebase_mtr100_mte100-train.txt   1/3
Reading in output_dir/freebase_mtr100_mte100-valid.txt   2/3
Reading in output_dir/freebase_mtr100_mte100-test.txt   3/3
Number of instance per file:[483142, 50000, 59071]
Number of nodes: 14951
Number of edges: 592213
Number of relations: 1345
Delimiter: ~    ~
['/home/nfp/.local/bin/marius_train', 'examples/training/configs/fb15k_gpu.ini']
[info] [10/28/21 22:12:59.865] Start preprocessing
[debug] [10/28/21 22:12:59.866] Initializing Model
[debug] [10/28/21 22:12:59.866] Empty Encoder
[debug] [10/28/21 22:12:59.866] DistMult Decoder
[debug] [10/28/21 22:12:59.867] data/ directory already exists
[debug] [10/28/21 22:12:59.867] data/marius/ directory already exists
[debug] [10/28/21 22:12:59.867] data/marius/embeddings/ directory already exists
[debug] [10/28/21 22:12:59.867] data/marius/relations/ directory already exists
[debug] [10/28/21 22:12:59.867] data/marius/edges/ directory already exists
[debug] [10/28/21 22:12:59.867] data/marius/edges/train/ directory already exists
[debug] [10/28/21 22:12:59.867] data/marius/edges/evaluation/ directory already exists
[debug] [10/28/21 22:12:59.867] data/marius/edges/test/ directory already exists
[debug] [10/28/21 22:12:59.880] Edges: DeviceMemory storage initialized
[debug] [10/28/21 22:12:59.894] Edges shuffled
[debug] [10/28/21 22:12:59.894] Edge storage initialized. Train: 483142, Valid: 50000, Test: 59071
[debug] [10/28/21 22:13:00.004] Node embeddings: DeviceMemory storage initialized
[debug] [10/28/21 22:13:00.004] Node embeddings state: DeviceMemory storage initialized
[debug] [10/28/21 22:13:00.004] Node embeddings initialized: 14951
[debug] [10/28/21 22:13:00.014] Relation embeddings: DeviceMemory storage initialized
[debug] [10/28/21 22:13:00.014] Relation embeddings state: DeviceMemory storage initialized
[debug] [10/28/21 22:13:00.014] Relation embeddings initialized: 1345
[debug] [10/28/21 22:13:00.014] Getting batches from edge list
[info] [10/28/21 22:13:00.014] Training set initialized
[debug] [10/28/21 22:13:00.014] Getting batches from edge list
[debug] [10/28/21 22:13:00.014] Batches initialized
[info] [10/28/21 22:13:00.015] Evaluation set initialized
[info] [10/28/21 22:13:00.015] Preprocessing Complete: 0.149s
[debug] [10/28/21 22:13:00.032] Loaded training set
[info] [10/28/21 22:13:00.032] ################ Starting training epoch 1 ################
[trace] [10/28/21 22:13:00.032] Starting Batch. ID 0, Starting Index 0, Batch Size 10000 
[trace] [10/28/21 22:13:00.034] Batch: 0 Accumulated 11109 unique embeddings
[trace] [10/28/21 22:13:00.034] Batch: 0 Accumulated 640 unique relations
[trace] [10/28/21 22:13:00.034] Batch: 0 Indices sent to device
[trace] [10/28/21 22:13:00.034] Batch: 0 Node Embeddings read
[trace] [10/28/21 22:13:00.034] Batch: 0 Node State read
[trace] [10/28/21 22:13:00.034] Batch: 0 Relation Embeddings read
[trace] [10/28/21 22:13:00.034] Batch: 0 Relation State read
[trace] [10/28/21 22:13:00.035] Batch: 0 prepared for compute
[debug] [10/28/21 22:13:00.040] Loss: 124804.266, Regularization loss: 0.012812799
/pytorch/aten/src/ATen/native/cuda/Loss.cu:455: nll_loss_backward_reduce_cuda_kernel_2d: block: [0,0,0], thread: [1,0,0] Assertion `t >= 0 && t < n_classes` failed.
/pytorch/aten/src/ATen/native/cuda/Loss.cu:455: nll_loss_backward_reduce_cuda_kernel_2d: block: [0,0,0], thread: [2,0,0] Assertion `t >= 0 && t < n_classes` failed.
/pytorch/aten/src/ATen/native/cuda/Loss.cu:455: nll_loss_backward_reduce_cuda_kernel_2d: block: [0,0,0], thread: [3,0,0] Assertion `t >= 0 && t < n_classes` failed.
/pytorch/aten/src/ATen/native/cuda/Loss.cu:455: nll_loss_backward_reduce_cuda_kernel_2d: block: [0,0,0], thread: [4,0,0] Assertion `t >= 0 && t < n_classes` failed.
/pytorch/aten/src/ATen/native/cuda/Loss.cu:455: nll_loss_backward_reduce_cuda_kernel_2d: block: [0,0,0], thread: [5,0,0] Assertion `t >= 0 && t < n_classes` failed.
/pytorch/aten/src/ATen/native/cuda/Loss.cu:455: nll_loss_backward_reduce_cuda_kernel_2d: block: [0,0,0], thread: [6,0,0] Assertion `t >= 0 && t < n_classes` failed.
/pytorch/aten/src/ATen/native/cuda/Loss.cu:455: nll_loss_backward_reduce_cuda_kernel_2d: block: [0,0,0], thread: [7,0,0] Assertion `t >= 0 && t < n_classes` failed.
/pytorch/aten/src/ATen/native/cuda/Loss.cu:455: nll_loss_backward_reduce_cuda_kernel_2d: block: [0,0,0], thread: [8,0,0] Assertion `t >= 0 && t < n_classes` failed.
/pytorch/aten/src/ATen/native/cuda/Loss.cu:455: nll_loss_backward_reduce_cuda_kernel_2d: block: [0,0,0], thread: [9,0,0] Assertion `t >= 0 && t < n_classes` failed.
/pytorch/aten/src/ATen/native/cuda/Loss.cu:455: nll_loss_backward_reduce_cuda_kernel_2d: block: [0,0,0], thread: [10,0,0] Assertion `t >= 0 && t < n_classes` failed.
/pytorch/aten/src/ATen/native/cuda/Loss.cu:455: nll_loss_backward_reduce_cuda_kernel_2d: block: [0,0,0], thread: [11,0,0] Assertion `t >= 0 && t < n_classes` failed.
/pytorch/aten/src/ATen/native/cuda/Loss.cu:455: nll_loss_backward_reduce_cuda_kernel_2d: block: [0,0,0], thread: [12,0,0] Assertion `t >= 0 && t < n_classes` failed.
/pytorch/aten/src/ATen/native/cuda/Loss.cu:455: nll_loss_backward_reduce_cuda_kernel_2d: block: [0,0,0], thread: [13,0,0] Assertion `t >= 0 && t < n_classes` failed.
/pytorch/aten/src/ATen/native/cuda/Loss.cu:455: nll_loss_backward_reduce_cuda_kernel_2d: block: [0,0,0], thread: [14,0,0] Assertion `t >= 0 && t < n_classes` failed.
/pytorch/aten/src/ATen/native/cuda/Loss.cu:455: nll_loss_backward_reduce_cuda_kernel_2d: block: [0,0,0], thread: [15,0,0] Assertion `t >= 0 && t < n_classes` failed.
/pytorch/aten/src/ATen/native/cuda/Loss.cu:455: nll_loss_backward_reduce_cuda_kernel_2d: block: [0,0,0], thread: [16,0,0] Assertion `t >= 0 && t < n_classes` failed.
/pytorch/aten/src/ATen/native/cuda/Loss.cu:455: nll_loss_backward_reduce_cuda_kernel_2d: block: [0,0,0], thread: [17,0,0] Assertion `t >= 0 && t < n_classes` failed.
/pytorch/aten/src/ATen/native/cuda/Loss.cu:455: nll_loss_backward_reduce_cuda_kernel_2d: block: [0,0,0], thread: [18,0,0] Assertion `t >= 0 && t < n_classes` failed.
/pytorch/aten/src/ATen/native/cuda/Loss.cu:455: nll_loss_backward_reduce_cuda_kernel_2d: block: [0,0,0], thread: [19,0,0] Assertion `t >= 0 && t < n_classes` failed.
/pytorch/aten/src/ATen/native/cuda/Loss.cu:455: nll_loss_backward_reduce_cuda_kernel_2d: block: [0,0,0], thread: [20,0,0] Assertion `t >= 0 && t < n_classes` failed.
/pytorch/aten/src/ATen/native/cuda/Loss.cu:455: nll_loss_backward_reduce_cuda_kernel_2d: block: [0,0,0], thread: [21,0,0] Assertion `t >= 0 && t < n_classes` failed.
/pytorch/aten/src/ATen/native/cuda/Loss.cu:455: nll_loss_backward_reduce_cuda_kernel_2d: block: [0,0,0], thread: [22,0,0] Assertion `t >= 0 && t < n_classes` failed.
/pytorch/aten/src/ATen/native/cuda/Loss.cu:455: nll_loss_backward_reduce_cuda_kernel_2d: block: [0,0,0], thread: [23,0,0] Assertion `t >= 0 && t < n_classes` failed.
/pytorch/aten/src/ATen/native/cuda/Loss.cu:455: nll_loss_backward_reduce_cuda_kernel_2d: block: [0,0,0], thread: [24,0,0] Assertion `t >= 0 && t < n_classes` failed.
/pytorch/aten/src/ATen/native/cuda/Loss.cu:455: nll_loss_backward_reduce_cuda_kernel_2d: block: [0,0,0], thread: [25,0,0] Assertion `t >= 0 && t < n_classes` failed.
/pytorch/aten/src/ATen/native/cuda/Loss.cu:455: nll_loss_backward_reduce_cuda_kernel_2d: block: [0,0,0], thread: [26,0,0] Assertion `t >= 0 && t < n_classes` failed.
/pytorch/aten/src/ATen/native/cuda/Loss.cu:455: nll_loss_backward_reduce_cuda_kernel_2d: block: [0,0,0], thread: [27,0,0] Assertion `t >= 0 && t < n_classes` failed.
/pytorch/aten/src/ATen/native/cuda/Loss.cu:455: nll_loss_backward_reduce_cuda_kernel_2d: block: [0,0,0], thread: [28,0,0] Assertion `t >= 0 && t < n_classes` failed.
/pytorch/aten/src/ATen/native/cuda/Loss.cu:455: nll_loss_backward_reduce_cuda_kernel_2d: block: [0,0,0], thread: [29,0,0] Assertion `t >= 0 && t < n_classes` failed.
/pytorch/aten/src/ATen/native/cuda/Loss.cu:455: nll_loss_backward_reduce_cuda_kernel_2d: block: [0,0,0], thread: [30,0,0] Assertion `t >= 0 && t < n_classes` failed.
/pytorch/aten/src/ATen/native/cuda/Loss.cu:455: nll_loss_backward_reduce_cuda_kernel_2d: block: [0,0,0], thread: [31,0,0] Assertion `t >= 0 && t < n_classes` failed.
Traceback (most recent call last):
File "/home/nfp/.local/bin/marius_train", line 8, in <module>
sys.exit(main())
File "/home/nfp/.local/lib/python3.6/site-packages/marius/console_scripts/marius_train.py", line 8, in main
m.marius_train(len(sys.argv), sys.argv)
RuntimeError: CUDA error: device-side assert triggered
CUDA kernel errors might be asynchronously reported at some other API call,so the stacktrace below might be incorrect.
For debugging consider passing CUDA_LAUNCH_BLOCKING=1.
Exception raised from launch_unrolled_kernel at /pytorch/aten/src/ATen/native/cuda/CUDALoops.cuh:132 (most recent call first):
frame #0: c10::Error::Error(c10::SourceLocation, std::string) + 0x42 (0x7f95645bcd62 in /home/nfp/.local/lib/python3.6/site-packages/torch/lib/libc10.so)
frame #1: void at::native::gpu_kernel_impl<at::native::BinaryFunctor<float, float, float, at::native::AddFunctor<float> > >(at::TensorIteratorBase&, at::native::BinaryFunctor<float, float, float, at::native::AddFunctor<float> > const&) + 0xb37 (0x7f95665b2f27 in /home/nfp/.local/lib/python3.6/site-packages/torch/lib/libtorch_cuda_cu.so)
frame #2: void at::native::gpu_kernel<at::native::BinaryFunctor<float, float, float, at::native::AddFunctor<float> > >(at::TensorIteratorBase&, at::native::BinaryFunctor<float, float, float, at::native::AddFunctor<float> > const&) + 0x113 (0x7f95665bf333 in /home/nfp/.local/lib/python3.6/site-packages/torch/lib/libtorch_cuda_cu.so)
frame #3: void at::native::opmath_gpu_kernel_with_scalars<float, float, float, at::native::AddFunctor<float> >(at::TensorIteratorBase&, at::native::AddFunctor<float> const&) + 0xa9 (0x7f95665bf4c9 in /home/nfp/.local/lib/python3.6/site-packages/torch/lib/libtorch_cuda_cu.so)
frame #4: <unknown function> + 0xe5d953 (0x7f9566592953 in /home/nfp/.local/lib/python3.6/site-packages/torch/lib/libtorch_cuda_cu.so)
frame #5: at::native::add_kernel_cuda(at::TensorIteratorBase&, c10::Scalar const&) + 0x15 (0x7f95665930a5 in /home/nfp/.local/lib/python3.6/site-packages/torch/lib/libtorch_cuda_cu.so)
frame #6: <unknown function> + 0xe5e0cf (0x7f95665930cf in /home/nfp/.local/lib/python3.6/site-packages/torch/lib/libtorch_cuda_cu.so)
frame #7: at::native::structured_sub_out::impl(at::Tensor const&, at::Tensor const&, c10::Scalar const&, at::Tensor const&) + 0x40 (0x7f95a9f1ef00 in /home/nfp/.local/lib/python3.6/site-packages/torch/lib/libtorch_cpu.so)
frame #8: <unknown function> + 0x25e52ab (0x7f9567d1a2ab in /home/nfp/.local/lib/python3.6/site-packages/torch/lib/libtorch_cuda_cu.so)
frame #9: <unknown function> + 0x25e5372 (0x7f9567d1a372 in /home/nfp/.local/lib/python3.6/site-packages/torch/lib/libtorch_cuda_cu.so)
frame #10: at::_ops::sub_Tensor::redispatch(c10::DispatchKeySet, at::Tensor const&, at::Tensor const&, c10::Scalar const&) + 0xb9 (0x7f95aa55d3f9 in /home/nfp/.local/lib/python3.6/site-packages/torch/lib/libtorch_cpu.so)
frame #11: <unknown function> + 0x34be046 (0x7f95ac03c046 in /home/nfp/.local/lib/python3.6/site-packages/torch/lib/libtorch_cpu.so)
frame #12: <unknown function> + 0x34be655 (0x7f95ac03c655 in /home/nfp/.local/lib/python3.6/site-packages/torch/lib/libtorch_cpu.so)
frame #13: at::_ops::sub_Tensor::call(at::Tensor const&, at::Tensor const&, c10::Scalar const&) + 0x13f (0x7f95aa5b5b2f in /home/nfp/.local/lib/python3.6/site-packages/torch/lib/libtorch_cpu.so)
frame #14: <unknown function> + 0x3f299b0 (0x7f95acaa79b0 in /home/nfp/.local/lib/python3.6/site-packages/torch/lib/libtorch_cpu.so)
frame #15: torch::autograd::generated::LogsumexpBackward0::apply(std::vector<at::Tensor, std::allocator<at::Tensor> >&&) + 0x1dc (0x7f95abd1447c in /home/nfp/.local/lib/python3.6/site-packages/torch/lib/libtorch_cpu.so)
frame #16: <unknown function> + 0x3896817 (0x7f95ac414817 in /home/nfp/.local/lib/python3.6/site-packages/torch/lib/libtorch_cpu.so)
frame #17: torch::autograd::Engine::evaluate_function(std::shared_ptr<torch::autograd::GraphTask>&, torch::autograd::Node*, torch::autograd::InputBuffer&, std::shared_ptr<torch::autograd::ReadyQueue> const&) + 0x145b (0x7f95ac40fa7b in /home/nfp/.local/lib/python3.6/site-packages/torch/lib/libtorch_cpu.so)
frame #18: torch::autograd::Engine::thread_main(std::shared_ptr<torch::autograd::GraphTask> const&) + 0x57a (0x7f95ac4107aa in /home/nfp/.local/lib/python3.6/site-packages/torch/lib/libtorch_cpu.so)
frame #19: torch::autograd::Engine::thread_init(int, std::shared_ptr<torch::autograd::ReadyQueue> const&, bool) + 0x89 (0x7f95ac4081c9 in /home/nfp/.local/lib/python3.6/site-packages/torch/lib/libtorch_cpu.so)
frame #20: <unknown function> + 0xc71f (0x7f962b3ad71f in /home/nfp/.local/lib/python3.6/site-packages/torch/lib/libtorch_cuda.so)
frame #21: <unknown function> + 0x76db (0x7f962d01f6db in /lib/x86_64-linux-gnu/libpthread.so.0)
frame #22: clone + 0x3f (0x7f962d35871f in /lib/x86_64-linux-gnu/libc.so.6)

Expected behavior The program works well for CPU configs:

nfp@node19:~/marius$ bash examples/training/scripts/fb15k_cpu.sh 
fb15k
Downloading fb15k.tgz to output_dir/fb15k.tgz
Extracting
Extraction completed
Detected delimiter: ~   ~
Reading in output_dir/freebase_mtr100_mte100-train.txt   1/3
Reading in output_dir/freebase_mtr100_mte100-valid.txt   2/3
Reading in output_dir/freebase_mtr100_mte100-test.txt   3/3
Number of instance per file:[483142, 50000, 59071]
Number of nodes: 14951
Number of edges: 592213
Number of relations: 1345
Delimiter: ~    ~
['/home/nfp/.local/bin/marius_train', 'examples/training/configs/fb15k_cpu.ini']
[info] [10/28/21 22:19:07.259] Start preprocessing
[info] [10/28/21 22:19:08.397] Training set initialized
[info] [10/28/21 22:19:08.397] Evaluation set initialized
[info] [10/28/21 22:19:08.397] Preprocessing Complete: 1.137s
[info] [10/28/21 22:19:08.410] ################ Starting training epoch 1 ################
[info] [10/28/21 22:19:08.904] Total Edges Processed: 50000, Percent Complete: 0.099
[info] [10/28/21 22:19:09.252] Total Edges Processed: 95000, Percent Complete: 0.198
[info] [10/28/21 22:19:09.700] Total Edges Processed: 152000, Percent Complete: 0.298
[info] [10/28/21 22:19:09.998] Total Edges Processed: 190000, Percent Complete: 0.397
[info] [10/28/21 22:19:10.418] Total Edges Processed: 237000, Percent Complete: 0.496
[info] [10/28/21 22:19:10.809] Total Edges Processed: 286000, Percent Complete: 0.595
[info] [10/28/21 22:19:11.211] Total Edges Processed: 336000, Percent Complete: 0.694
[info] [10/28/21 22:19:11.567] Total Edges Processed: 383000, Percent Complete: 0.793
[info] [10/28/21 22:19:11.958] Total Edges Processed: 432000, Percent Complete: 0.893
[info] [10/28/21 22:19:12.320] Total Edges Processed: 478000, Percent Complete: 0.992
[info] [10/28/21 22:19:12.357] ################ Finished training epoch 1 ################
[info] [10/28/21 22:19:12.357] Epoch Runtime (Before shuffle/sync): 3946ms
[info] [10/28/21 22:19:12.357] Edges per Second (Before shuffle/sync): 122438.414
[info] [10/28/21 22:19:12.358] Pipeline flush complete
[info] [10/28/21 22:19:12.374] Edges Shuffled
[info] [10/28/21 22:19:12.374] Epoch Runtime (Including shuffle/sync): 3963ms
[info] [10/28/21 22:19:12.374] Edges per Second (Including shuffle/sync): 121913.195
[info] [10/28/21 22:19:12.389] Starting evaluating
[info] [10/28/21 22:19:12.709] Pipeline flush complete
[info] [10/28/21 22:19:15.909] Num Eval Edges: 50000
[info] [10/28/21 22:19:15.909] Num Eval Batches: 50
[info] [10/28/21 22:19:15.909] Auc: 0.941, Avg Ranks: 40.139, MRR: 0.336, Hits@1: 0.212, Hits@5: 0.476, Hits@10: 0.600, Hits@20: 0.707, Hits@50: 0.827, Hits@100: 0.895
[info] [10/28/21 22:19:15.920] Evaluation complete: 3531ms
[info] [10/28/21 22:19:15.931] ################ Starting training epoch 2 ################
[info] [10/28/21 22:19:16.361] Total Edges Processed: 46000, Percent Complete: 0.099
[info] [10/28/21 22:19:16.900] Total Edges Processed: 97000, Percent Complete: 0.198
[info] [10/28/21 22:19:17.424] Total Edges Processed: 156000, Percent Complete: 0.298
[info] [10/28/21 22:19:17.697] Total Edges Processed: 189000, Percent Complete: 0.397
[info] [10/28/21 22:19:18.078] Total Edges Processed: 238000, Percent Complete: 0.496
[info] [10/28/21 22:19:18.466] Total Edges Processed: 288000, Percent Complete: 0.595
[info] [10/28/21 22:19:18.825] Total Edges Processed: 336000, Percent Complete: 0.694
[info] [10/28/21 22:19:19.160] Total Edges Processed: 381000, Percent Complete: 0.793
[info] [10/28/21 22:19:19.584] Total Edges Processed: 436000, Percent Complete: 0.893
[info] [10/28/21 22:19:19.909] Total Edges Processed: 481000, Percent Complete: 0.992
[info] [10/28/21 22:19:19.928] ################ Finished training epoch 2 ################
[info] [10/28/21 22:19:19.928] Epoch Runtime (Before shuffle/sync): 3997ms
[info] [10/28/21 22:19:19.928] Edges per Second (Before shuffle/sync): 120876.16
[info] [10/28/21 22:19:19.929] Pipeline flush complete
[info] [10/28/21 22:19:19.947] Edges Shuffled
[info] [10/28/21 22:19:19.948] Epoch Runtime (Including shuffle/sync): 4016ms
[info] [10/28/21 22:19:19.948] Edges per Second (Including shuffle/sync): 120304.29
[info] [10/28/21 22:19:19.961] Starting evaluating
[info] [10/28/21 22:19:20.246] Pipeline flush complete
[info] [10/28/21 22:19:20.255] Num Eval Edges: 50000
[info] [10/28/21 22:19:20.255] Num Eval Batches: 50
[info] [10/28/21 22:19:20.255] Auc: 0.972, Avg Ranks: 21.458, MRR: 0.431, Hits@1: 0.294, Hits@5: 0.595, Hits@10: 0.719, Hits@20: 0.812, Hits@50: 0.906, Hits@100: 0.949
[info] [10/28/21 22:19:20.271] Evaluation complete: 309ms
[info] [10/28/21 22:19:20.282] ################ Starting training epoch 3 ################
[info] [10/28/21 22:19:20.694] Total Edges Processed: 47000, Percent Complete: 0.099
[info] [10/28/21 22:19:21.042] Total Edges Processed: 95000, Percent Complete: 0.198
[info] [10/28/21 22:19:21.425] Total Edges Processed: 143000, Percent Complete: 0.298
[info] [10/28/21 22:19:21.872] Total Edges Processed: 203000, Percent Complete: 0.397
^C[info] [10/28/21 22:19:22.195] Total Edges Processed: 244000, Percent Complete: 0.496
[info] [10/28/21 22:19:22.561] Total Edges Processed: 288000, Percent Complete: 0.595
[info] [10/28/21 22:19:22.971] Total Edges Processed: 342000, Percent Complete: 0.694
[info] [10/28/21 22:19:23.266] Total Edges Processed: 380000, Percent Complete: 0.793
[info] [10/28/21 22:19:23.747] Total Edges Processed: 438000, Percent Complete: 0.893
[info] [10/28/21 22:19:24.101] Total Edges Processed: 479142, Percent Complete: 0.992
...

Environment I tried on 2 machines and got the same error. Platform: linux (Ubuntu 18.04 LTS) Python version: 3.6.9 Pytorch version: 1.10.0+cu102; 1.10.0+cu113

marius-team / marius

CUDA error: device-side assert triggered when trying to execute example scripts #80