Closed jamjamjon closed 7 months ago
Is run_fp32
being called from multiple threads/or a different thread than the one the session was created on?
Yes! I use [mpsc](std::sync::mpsc::channel())
// use mpsc
let (tx, rx) = std::sync::mpsc::channel();
thread::spawn(move || {
for (images, _paths) in dl.into_iter() {
tx.send(images).unwrap();
}
});
thread::spawn(move || {
for (_i, message) in rx.iter().enumerate() {
let _ys = model.run(&message).unwrap();
}
})
.join()
.unwrap();
But, this problem will appear even if I use the single thread.
// load then run
let x = image::io::Reader::open("./assets/demo.jpg")?.decode()?;
let y = model.run(&vec![x])?;
Here is the output:
cargo run -r --example rtdetr
Finished release [optimized] target(s) in 0.05s
Running `target/release/examples/rtdetr`
> Using CUDA
[ORT Inference]: 5.165937265s
Results saved at: runs/RT-DETR/2024-03-09-13-39-23-465984633.jpg
[Results { probs: None, Bboxes: Some([Bbox { xmin: 23.76523, ymin: 229.94244, xmax: 804.8533, ymax: 730.4618, id: 5, confidence: 0.9469714 }, Bbox { xmin: 668.5972, ymin: 394.98087, xmax: 809.0648, ymax: 880.43445, id: 0, confidence: 0.9517705 }, Bbox { xmin: 49.653255, ymin: 399.2633, xmax: 247.06194, ymax: 904.75684, id: 0, confidence: 0.9512628 }, Bbox { xmin: 222.2634, ymin: 405.63873, xmax: 345.4751, ymax: 860.4706, id: 0, confidence: 0.9255427 }, Bbox { xmin: 0.29167414, ymin: 550.899, xmax: 74.54556, ymax: 867.50653, id: 0, confidence: 0.70238566 }, Bbox { xmin: 283.0564, ymin: 484.21506, xmax: 297.04556, ymax: 520.7864, id: 27, confidence: 0.42629278 }]), Keypoints: None, Masks: None }]
corrupted double-linked list
[1] 5290 IOT instruction (core dumped) cargo run -r --example rtdetr
Given that the results print before the crash I assume it may be occurring when something is dropped. Are you able to use a debugger to step through & see where it crashes?
cargo run -r --example yolov8
Finished release [optimized] target(s) in 38.40s
Running `target/release/examples/yolov8`
corrupted double-linked list
[1] 47699 IOT instruction (core dumped) cargo run -r --example yolov8
The model inference results is ok, and the image annotated will show up. This bug appears when I press the button.
I tried some code with ort=1.16.3
and onnxruntime=1.16.3
, this bug won't show up.
2.0.0 rc0 also has this problem when running yolov8 example using CUDA
execute provider.
@jamjamjon What's rustc --version
say?
I could reproduce this on Windows using rustc 1.78.0-nightly (2dceda4f3 2024-03-01)
but only with --release
. --profile dev
doesn't crash.
rustc 1.78.0-nightly (46b180ec2 2024-03-08)
(latest nightly) and rustc 1.76.0 (07dca489a 2024-02-04)
(stable) do not crash in either profile. Seems like a regression in rustc that's already been fixed.
1.76.0在 2024年3月10日,00:03,Carson M. @.***> 写道: @jamjamjon What's rustc --version say?
—Reply to this email directly, view it on GitHub, or unsubscribe.You are receiving this because you were mentioned.Message ID: @.***>
When I use
CUDA
provider to run the corresponding onnx model and complete the inference, this problem will appear from time to time. (The inference result is totally correct!)Enviroment
cuda: 11.7 ort: 2.0.0 alpha4 onnxruntime:1.7.1 & 1.7.0 gpu: GeForce RTX3060 OS: ubuntu 23.04, X86_64
Code snippest
When using
TensorRT
andCPU
providers, there will be fine.Need your help, please.