ultralytics / yolov5

YOLOv5 🚀 in PyTorch > ONNX > CoreML > TFLite
https://docs.ultralytics.com
GNU Affero General Public License v3.0
50.96k stars 16.4k forks source link

Segmentation fault when training #12270

Closed KAkhoa12 closed 1 year ago

KAkhoa12 commented 1 year ago

Search before asking

YOLOv5 Component

Training

Bug

Hi , i have a problem Segmentation fault like this. Does anyone know how to fix it? :<

AutoAnchor: 5.48 anchors/target, 0.999 Best Possible Recall (BPR). Current anchors are a good fit to dataset Plotting labels to runs\train\exp24\labels.jpg... Image sizes 640 train, 640 val Using 8 dataloader workers Logging results to runs\train\exp24 Starting training for 3 epochs...

  Epoch    GPU_mem   box_loss   obj_loss   cls_loss  Instances       Size    

0%| | 0/48 [00:00<?, ?it/s]Segmentation fault

Environment

No response

Minimal Reproducible Example

No response

Additional

No response

Are you willing to submit a PR?

github-actions[bot] commented 1 year ago

👋 Hello @KAkhoa12, thank you for your interest in YOLOv5 🚀! Please visit our ⭐️ Tutorials to get started, where you can find quickstart guides for simple tasks like Custom Data Training all the way to advanced concepts like Hyperparameter Evolution.

If this is a 🐛 Bug Report, please provide a minimum reproducible example to help us debug it.

If this is a custom training ❓ Question, please provide as much information as possible, including dataset image examples and training logs, and verify you are following our Tips for Best Training Results.

Requirements

Python>=3.8.0 with all requirements.txt installed including PyTorch>=1.8. To get started:

git clone https://github.com/ultralytics/yolov5  # clone
cd yolov5
pip install -r requirements.txt  # install

Environments

YOLOv5 may be run in any of the following up-to-date verified environments (with all dependencies including CUDA/CUDNN, Python and PyTorch preinstalled):

Status

YOLOv5 CI

If this badge is green, all YOLOv5 GitHub Actions Continuous Integration (CI) tests are currently passing. CI tests verify correct operation of YOLOv5 training, validation, inference, export and benchmarks on macOS, Windows, and Ubuntu every 24 hours and on every commit.

Introducing YOLOv8 🚀

We're excited to announce the launch of our latest state-of-the-art (SOTA) object detection model for 2023 - YOLOv8 🚀!

Designed to be fast, accurate, and easy to use, YOLOv8 is an ideal choice for a wide range of object detection, image segmentation and image classification tasks. With YOLOv8, you'll be able to quickly and accurately detect objects in real-time, streamline your workflows, and achieve new levels of accuracy in your projects.

Check out our YOLOv8 Docs for details and get started with:

pip install ultralytics
glenn-jocher commented 1 year ago

@KAkhoa12 hi there,

It seems like you are experiencing a segmentation fault during training. This error can occur for various reasons, and it's usually challenging to pinpoint the exact cause without additional information.

To help you resolve this issue, could you please provide more details, such as the dataset you are using, the training command you executed, and any other relevant information? This will allow us to understand your setup better and provide more targeted assistance.

Additionally, please check if you have any custom code modifications that may be causing this problem. Keep in mind that YOLOv5 is highly optimized, so it's generally recommended to avoid modifying the core codebase unless necessary.

Thank you for reporting this issue, and we'll do our best to assist you.

KAkhoa12 commented 1 year ago

@glenn-jocher yes , Sorry for the delayed response Here is my detail issue I have 1 image data labeled by roboflow with 1030 images and divided it into 73% training data and 26% valid data and the remaining 1% is test, my labeling topic is vehicle classification. traffic on the road, will include 5 layers in the data.yaml file including the following

#====================
train: ../train/images
val: ../valid/images
test: ../test/images

NC: 5
names: ['bus', 'car', 'motorbike', 'person', 'truck']
#====================

Next I use the following command to train the model py train.py --img 640 --epochs 3 --data data.yaml --weights yolov5s.pt And here is the progress when I use the train command above

train: weights=yolov5s.pt, cfg=, data=data.yaml, hyp=data\hyps\hyp.scratch-low.yaml, epochs=3, batch_size=16, imgsz=640, rect=False, resume=False, nosave=False, noval=False, noautoanchor=False, noplots=False, evolve=None, bucket=, cache=None, image_weights=False, device=, multi_scale=False, single_cls=False, optimizer=SGD, sync_bn=False, workers=8, project=runs\train, name=exp, exist_ok=False, quad=False, cos_lr=False, label_smoothing=0.0, patience=100, freeze=[0], save_period=-1, seed=0, local_rank=-1, entity=None, upload_dataset=False, bbox_interval=-1, artifact_alias=latest
remote: Enumerating objects: 13, done.
remote: Counting objects: 100% (13/13), done.
remote: Compressing objects: 100% (12/12), done.
remote: Total 13 (delta 5), reused 5 (delta 1), pack-reused 0
Unpacking objects: 100% (13/13), 18.62 KiB | 141.00 KiB/s, done.
From https://github.com/ultralytics/yolov5
   4d687c8..53efd07  master     -> origin/master
 * [new branch]      snyk-fix-f1897f20d8a73238398e465bbc42ef2a -> origin/snyk-fix-f1897f20d8a73238398e465bbc42ef2a
github:  YOLOv5 is out of date by 2 commits. Use 'git pull' or 'git clone https://github.com/ultralytics/yolov5' to update.
YOLOv5  v7.0-228-g4d687c8 Python-3.11.3 torch-2.1.0+cpu CPU

hyperparameters: lr0=0.01, lrf=0.01, momentum=0.937, weight_decay=0.0005, warmup_epochs=3.0, warmup_momentum=0.8, warmup_bias_lr=0.1, box=0.05, cls=0.5, cls_pw=1.0, obj=1.0, obj_pw=1.0, iou_t=0.2, anchor_t=4.0, fl_gamma=0.0, hsv_h=0.015, hsv_s=0.7, hsv_v=0.4, degrees=0.0, translate=0.1, scale=0.5, shear=0.0, perspective=0.0, flipud=0.0, fliplr=0.5, mosaic=1.0, mixup=0.0, copy_paste=0.0
Comet: run 'pip install comet_ml' to automatically track and visualize YOLOv5  runs in Comet
TensorBoard: Start with 'tensorboard --logdir runs\train', view at http://localhost:6006/
Overriding model.yaml nc=80 with nc=5

                 from  n    params  module                   
               arguments
  0                -1  1      3520  models.common.Conv                      [3, 32, 6, 2, 2]
  1                -1  1     18560  models.common.Conv                      [32, 64, 3, 2]
  2                -1  1     18816  models.common.C3         
               [64, 64, 1]
  3                -1  1     73984  models.common.Conv                      [64, 128, 3, 2]
  4                -1  2    115712  models.common.C3         
               [128, 128, 2]
  5                -1  1    295424  models.common.Conv                      [128, 256, 3, 2]
  6                -1  3    625152  models.common.C3         
               [256, 256, 3]
  7                -1  1   1180672  models.common.Conv                      [256, 512, 3, 2]
  8                -1  1   1182720  models.common.C3         
               [512, 512, 1]
  9                -1  1    656896  models.common.SPPF                      [512, 512, 5]
 21                -1  1    590336  models.common.Conv                      [256, 256, 3, 2]
 22          [-1, 10]  1         0  models.common.Concat                    [1]
 23                -1  1   1182720  models.common.C3                        [512, 512, 1, False]
 24      [17, 20, 23]  1     26970  models.yolo.Detect                      [5, [[10, 13, 16, 30, 33, 23], [30, 61, 62, 45, 59, 119], [116, 90, 156, 198, 373, 326]], [128, 256, 512]]
Model summary: 214 layers, 7033114 parameters, 7033114 gradients, 16.0 GFLOPs

Transferred 343/349 items from yolov5s.pt
optimizer: SGD(lr=0.01) with parameter groups 57 weight(decay=0.0), 60 weight(decay=0.0005), 60 bias
train: Scanning C:\WorkSpace\5-Python\_Python_AI\Data-YOLO-T
train: WARNING  Cache directory C:\WorkSpace\5-Python\_Python_AI\Data-YOLO-Train\yolov5\train is not writeable: [WinError 183] Cannot create a file when that file already exists: 'C:\\WorkSpace\\5-Python\\_Python_AI\\Data-YOLO-Train\\yolov5\\train\\labels.cache.npy' -> 'C:\\WorkSpace\\5-Python\\_Python_AI\\Data-YOLO-Train\\yolov5\\train\\labels.cache'
val: Scanning C:\WorkSpace\5-Python\_Python_AI\Data-YOLO-Tra
val: WARNING  Cache directory C:\WorkSpace\5-Python\_Python_AI\Data-YOLO-Train\yolov5\valid is not writeable: [WinError 183] Cannot create a file when that file already exists: 'C:\\WorkSpace\\5-Python\\_Python_AI\\Data-YOLO-Train\\yolov5\\valid\\labels.cache.npy' -> 'C:\\WorkSpace\\5-Python\\_Python_AI\\Data-YOLO-Train\\yolov5\\valid\\labels.cache'
AutoAnchor: 5.48 anchors/target, 0.999 Best Possible Recall (BPR). Current anchors are a good fit to dataset
Plotting labels to runs\train\exp32\labels.jpg...
Image sizes 640 train, 640 val
Using 8 dataloader workers
Logging results to runs\train\exp32
Starting training for 3 epochs...

      Epoch    GPU_mem   box_loss   obj_loss   cls_loss  Instances       Size
0%|          | 0/48 [00:00<?, ?it/s]Segmentation fault

After all I see Segmentation fault in first epoch

KAkhoa12 commented 1 year ago

@glenn-jocher hi , I fixed the above error and trained the model, and the next problem is that the prediction frame is very large, it takes up the entire screen of the image to predict, like this.

image

Is there a way to reduce it so I can see things clearly?

glenn-jocher commented 1 year ago

@KAkhoa12

To reduce the size of the prediction frame and make it more visible, you can adjust the --img-size parameter when running the detect.py script.

For example, you can set a lower value for --img-size, such as --img-size 416, to reduce the size of the prediction frame. Experiment with different values until you find the desired size that allows you to see things clearly.

Here's an example command to run the detect.py script with a smaller image size:

python detect.py --weights path/to/weights.pt --img 416 --conf 0.4 --source path/to/images

Remember to replace path/to/weights.pt with the path to your trained model weights file, and path/to/images with the path to the folder containing your images.

By adjusting the --img-size parameter, you can control the size of the prediction frame and customize it to your preferences.

Let me know if this helps!

KAkhoa12 commented 1 year ago

@glenn-jocher Thanks a lot, it helped me

glenn-jocher commented 1 year ago

@KAkhoa12 hi there,

You're welcome! I'm glad to hear that the information was helpful to you. If you have any more questions or need further assistance, feel free to ask. Good luck with your project!

Best regards.

Sigoloh commented 7 months ago

@glenn-jocher hi , I fixed the above error and trained the model, and the next problem is that the prediction frame is very large, it takes up the entire screen of the image to predict, like this.

image

Is there a way to reduce it so I can see things clearly?

I Know its been a while since you posted this but how did you solved the issue with the seg fault?

glenn-jocher commented 6 months ago

@Sigoloh hi there!

Great to hear you resolved the segmentation fault issue! Regarding the large prediction frame, you might want to adjust the --img-size parameter when running detections to better fit your needs. For example:

python detect.py --weights your_model_weights.pt --img 416 --source your_image_directory

Adjusting the --img-size to a smaller value like 416 can help make the prediction frames smaller and more precise. Give it a try and see if it helps clarify the predictions!

If you have any more questions or need further assistance, feel free to ask. Happy detecting! 😊