Segmentation Fault (Core Dumped) Bug in Fine-Tuned YOLOv8 Model with Low Confidence Inference

takhyun12 commented 3 weeks ago

Search before asking

[X] I have searched the YOLOv8 issues and found no similar bug report.

YOLOv8 Component

Predict

Bug

I would like to share a significant bug related to confidence inferences identified in the fine-tuned YOLOv8 model.

I built a custom dataset through Roboflow and fine-tuned it using YOLOv8x.

I discovered that the fine-tuned model only triggers a Segmentation Fault (core dumped) when inferring specific images in a Linux environment. I analyzed this issue using various profiling tools.

As a result, I found a close relationship between the conf parameter and this issue, as detailed below:

First, I am sharing the fine-tuned model and test image I used in my experiments:

model : download

image : download

My findings are as follows:

The issue does not occur when the conf parameter is not used or set to the default value of 0.25.
For the attached model, a conf value lower than 0.15 results in a Segmentation Fault for specific images.
Lower conf values cause Segmentation Faults with different images, but for the same parameter value, the issue is reproducible with the same images.

I tried various experiments to resolve this issue, but none were effective:

Testing after converting the model to TorchScript, ONNX, and OpenVINO
Using torch.no_grad(), torch.inference_mode(), gc.collect(), and torch.empty_cache()
Using NGC Docker and Ultralytics Docker
Training in different GPU environments such as A100, H100, RTX4090, and multi-GPU setups
Testing various parameters like stream, half, and cpu operations during inference

This issue is consistently reproducible in all Linux environments, but it does not occur in Windows. I believe that when the conf - value is low, there might be an explosive increase in memory usage during the model inference.

I wrote test code to infer around 1,000 images and have been running it. If the problem does not reproduce with the images I have attached, it could be due to subtle changes in dependencies such as torch.

Please run about 100 images through my fine-tuned model with a low conf value, and you'll quickly find the problematic images.

I created a model that performs best with a low conf value of 0.01 or 0.001, but I'm unable to use it due to this issue.

I would greatly appreciate your assistance.

Environment

Ultralytics YOLOv8.1.9 🚀 Python-3.10.13 torch-2.0.1+cu117 CUDA:0 (NVIDIA A100 80GB PCIe, 81229MiB)
Setup complete ✅ (128 CPUs, 251.5 GB RAM, 2729.3/3080.1 GB disk)

OS                  Linux-4.18.0-425.19.2.el8_7.x86_64-x86_64-with-glibc2.31
Environment         Docker
Python              3.10.13
Install             pip
RAM                 251.48 GB
CPU                 AMD EPYC 7543 32-Core Processor
CUDA                11.7

matplotlib          ✅ 3.8.3>=3.3.0
numpy               ✅ 1.26.4>=1.22.2
opencv-python       ✅ 4.8.0.76>=4.6.0
pillow              ✅ 10.2.0>=7.1.2
pyyaml              ✅ 6.0.1>=5.3.1
requests            ✅ 2.31.0>=2.23.0
scipy               ✅ 1.12.0>=1.4.1
torch               ✅ 2.0.1>=1.8.0
torchvision         ✅ 0.15.2>=0.9.0
tqdm                ✅ 4.66.1>=4.64.0
psutil              ✅ 5.9.8
py-cpuinfo          ✅ 9.0.0
thop                ✅ 0.1.1-2209072238>=0.1.1
pandas              ✅ 2.2.0>=1.1.4
seaborn             ✅ 0.13.2>=0.11.0

Minimal Reproducible Example

from ultralytics import YOLO

# Use the model I've attached.
model = YOLO('fine_tuned_model.pt')

# Use the image I've attached and set the conf parameter to a low value as follows.
model.predict('image.jpg', conf=0.01)

Additional

I think the following PRs are closely related to the issue I'm reporting:

Are you willing to submit a PR?

[X] Yes I'd like to help by submitting a PR!

glenn-jocher commented 3 weeks ago

Hello! Thank you for the detailed report and for conducting such thorough investigations into this issue. 🙌

The Segmentation Fault you're experiencing when setting a low confidence threshold is indeed intriguing. It sounds like an issue related to memory handling or buffer overflow that's specific to the configurations in a Linux environment, as you suggested.

Given that you've already tried multiple environments and configurations without success, here is a potential workaround:

Limit the output detection: Sometimes setting a very low confidence threshold can yield an overwhelmingly large number of detections per image, which might be causing memory overload. You could initially try with a higher conf, and incrementally lower it to find a safe threshold that avoids the Segmentation Fault.

Additionally, considering this seems related to how Linux handles memory versus Windows, examining the system calls and memory management differences in your Docker or Linux environment could provide more insights. Using tools like valgrind might help identify if there's specific illegal memory access occurring.

Could you please check if limiting the number of detections resolves the issue? Meanwhile, I'll further investigate potential fixes from our side as well.

Thank you for offering to help with a PR! We appreciate your contribution. Let's aim to isolate this behavior further and work on a solid patch. 🚀

takhyun12 commented 2 weeks ago

@glenn-jocher

I conducted experiments to limit the number of detections as you suggested. I set the conf to 0.01 and varied max_det as demonstrated in the code below:

with torch.inference_mode():
    segment_results = self.model.predict(
        source=image,
        save=False,
        save_txt=False,
        imgsz=(640, 640),
        max_det={max_detection},
        conf=0.01,
        stream=True,
    )

Experiment Results

Based on the attached model, it was confirmed that no errors occurred when reducing max_det to 2 or less.
Errors occurred independently of the inference time.

`max_det`	Result	Console Output	Time
1	ok	-	-
2	ok	-	-
3	Segmentation fault	640x640: 1 crow's feet, 1 mental crease, 1 nasolabial fold	90.0ms
5	Segmentation fault	640x640: 4 crow's feet, 1 mental crease	84.0ms
10	Segmentation fault	640x640: 7 crow's feet, 1 mental crease, 2 nasolabial folds	37.8ms
100	Segmentation fault	640x640: 53 crow's feet, 14 forehead wrinkles, 1 frown line, 13 marionette lines, 7 mental creases, 12 nasolabial folds	27.1ms

Environment

There are no other processes running on my server besides this one.

Ultralytics YOLOv8.2.11 🚀 Python-3.10.14 torch-2.3.0+cu121 CUDA:0 (NVIDIA A100-SXM4-80GB, 81053MiB)
Setup complete ✅ (128 CPUs, 251.8 GB RAM, 612.1/12469.0 GB disk)

OS                  Linux-4.19.93-1.nbp.el7.x86_64-x86_64-with-glibc2.35
Environment         Docker
Python              3.10.14
Install             pip
RAM                 251.82 GB
CPU                 AMD EPYC 7543 32-Core Processor
CUDA                12.1

matplotlib          ✅ 3.8.4>=3.3.0
opencv-python       ✅ 4.9.0.80>=4.6.0
pillow              ✅ 10.3.0>=7.1.2
pyyaml              ✅ 6.0.1>=5.3.1
requests            ✅ 2.31.0>=2.23.0
scipy               ✅ 1.13.0>=1.4.1
torch               ✅ 2.3.0>=1.8.0
torchvision         ✅ 0.18.0>=0.9.0
tqdm                ✅ 4.66.4>=4.64.0
psutil              ✅ 5.9.8
py-cpuinfo          ✅ 9.0.0
thop                ✅ 0.1.1-2209072238>=0.1.1
pandas              ✅ 2.2.2>=1.1.4
seaborn             ✅ 0.13.2>=0.11.0

Please let me know if there is anything else you need.

takhyun12 commented 2 weeks ago

I've discovered an interesting point that if I do not use the conf option, no errors occur no matter how high max_det is set.

with torch.inference_mode():
    segment_results = self.model.predict(
        source=image,
        save=False,
        save_txt=False,
        imgsz=(640, 640),
        max_det=100,
        # conf=0.01,
        stream=True,
    )

My fine-tuned model performs exceptionally well when the conf is low. Is there another method to achieve the same effect without using the conf option?

glenn-jocher commented 2 weeks ago

@takhyun12 hello! Thanks for the interesting find! 😊 It's quite insightful that omitting the conf parameter entirely avoids the segmentation fault issue.

Regarding achieving effective low confidence operation without explicitly setting a very low conf value, one approach is to filter the results post-prediction based on your confidence threshold. This way, you can use a default or slightly lower conf setting that does not cause crashes, and then programmatically discard detections below your desired confidence level in the post-processing step:

results = model.predict(source=image, imgsz=(640, 640), max_det=100, stream=True)
filtered_results = [r for r in results if r.confidence >= 0.01]

This method allows you to handle an arbitrary number of detections more safely and gives you the flexibility to adjust your confidence threshold dynamically after observing model output.

Let me know if this works for you or if further adjustments are needed!

takhyun12 commented 1 week ago

@glenn-jocher Hello,

I have an additional question regarding this issue.

The code you suggested did not solve the problem:

results = model.predict(source=image, imgsz=(640, 640), max_det=100, stream=True)
filtered_results = [r for r in results if r.confidence >= 0.01]

The results of the above code are completely different from the following code:

results = model.predict(source=image, imgsz=(640, 640), conf=0.01, max_det=100, stream=True)

Output:

Output Image A	Output Image B (use `conf`)

I think there might be a difference in how confidence is handled during prediction and post-processing. Do you have any suggestions for this case? Or is there any progress in fixing this Segmentation Fault bug?

Thank you for your help.

glenn-jocher commented 1 week ago

Hello @takhyun12,

Thank you for your follow-up and for testing out the code snippet. You're correct in observing that filtering post-prediction can yield different results compared to setting the conf threshold directly in the predict method. This difference arises because setting conf directly affects the model's internal scoring and non-max suppression process during prediction, which can't be exactly replicated by filtering afterwards.

As for the segmentation fault issue, it's still under investigation. A potential workaround could be to adjust the conf parameter slightly higher to a stable threshold that doesn't trigger the fault, and then fine-tune the max_det parameter to manage the number of detections.

Here's a slight modification to try:

results = model.predict(source=image, imgsz=(640, 640), conf=0.05, max_det=100, stream=True)
filtered_results = [r for r in results if r.confidence >= 0.01]

This approach uses a slightly higher internal confidence while still allowing you to capture lower confidence detections for further analysis.

Let's stay in touch as we work through these issues. Your feedback is invaluable! 🌟

ultralytics / ultralytics