wjnicol commented 7 months ago

Search before asking

[X] I have searched the YOLOv8 issues and discussions and found no similar questions.

Question

Hello,

I am currently training different models to help recognize features in microscopy images to help with data collection and so on.

Images need to be converted to a format readable by YOLO. I wrote python modules to stream line this process, label the images (with LabelImg) and dispatch the data in train/val/test folders that can then be fed to YOLO. The idea is that whenever I have a new batch of images, I process them through my formatting pipeline that adds the new images in the train/val/test folders. Multiple questions arise:

Is it useful to keep the train/val/test datasplit to 60/20/20 for example or do I gain more by just adding all my new batches of data in training and rely on the val and test datasets generated at start? If so, the ratio would not be 60 20 20 but would skew itself, as data is added, slowly towards 100 0 0.
When I have a model, trained for instance on 100 labelled images. The day after, I add another 100 images from a new batch in that folder to provide more data for training. Do I train the model from scratch or should I train the already existing model on just the new data or on the combination of old and new data? The later case would be at risk of overfitting. I'm inclined to do one but it seems too computationally expensive compared to just training on the new images. Reformulated, I am asking if training on one dataset to get a model and then training this model on another dataset the same as training one model on both datasets combined?

Thank you,

William J Nicolas

Additional

No response

glenn-jocher commented 7 months ago

@wjnicol hi William,

Great questions! Here's a concise way to approach these:

Data Split Ratio: Maintaining a standard split like 60/20/20 is generally recommended for balanced learning and validation. However, as your dataset grows, you might not need to expand your validation and test sets proportionally. It's more important to ensure these sets are representative. Keeping them fixed while expanding the training set can be practical and still provide robust validation metrics.
Incremental Training: When you receive new batches of data, it's often more efficient to continue training (also known as fine-tuning) your existing model rather than starting from scratch. This approach leverages the model's learned features, potentially reducing the overall computational cost. To mitigate overfitting when adding new data, consider techniques like data augmentation, regularization, or even adjusting your learning rate. Here's an example for fine-tuning:
```
from ultralytics import YOLO

# Load your existing model
model = YOLO('path/to/existing/model.pt')

# Fine-tune the model on the new combined dataset
results = model.train(data='new_data.yaml', epochs=50, resume=True)
```

Training on a combined dataset tends to generalize better, as the model sees more variation during training. Continuously adding to the training dataset and occasionally fine-tuning on the combined data usually offers a balance between computational resources and model performance.

Hope this helps!

wjnicol commented 7 months ago

Thank you very much for the quick and concise answer.

For clarity, what you call combined dataset is old data (already seen by model) + new batches correct (my third option)?

So what I should do is train on exisiting model with combined dataset but implement data augmentation, learning rate adjustment and data regularization as I go correct?

William

glenn-jocher commented 7 months ago

@wjnicol yes, William, your understanding is spot on! 🎯 By "combined dataset," I indeed mean your original dataset plus any new batches of data you collect. This approach helps your model learn from a broader set of examples, continually improving its accuracy and robustness.

And yes, as you accumulate more data and train your model on this combined dataset, incorporating strategies like data augmentation, adjusting the learning rate, and applying regularization techniques will be key in maintaining a good balance between performance and generalization. This way, you ensure that your model remains effective and doesn't overfit as it gets exposed to more data.

Here's a quick reminder on how to add data augmentation in YOLO:

# Assuming you're using a custom 'data.yaml' and starting from an existing model
from ultralytics import YOLO

model = YOLO('path/to/your/model.pt')

# Train with data augmentation (modify according to your needs)
results = model.train(data='data.yaml', epochs=50, augment=True, lr0=0.01)

Keep up the great work, and don't hesitate to reach out if you have more questions!

wjnicol commented 7 months ago

Thank you very much!

glenn-jocher commented 7 months ago

@wjnicol absolutely, happy to help out!

Yes, you got it right! 🌟 The "combined dataset" refers to the mix of your original (already seen) data and any new data you gather. For refining your model with this growing dataset, implementing data augmentation and adjusting the learning rate are excellent practices. These strategies, along with regularization, help in enhancing model generalization and avoiding overfitting.

Here's a snippet to incorporate data augmentation:

from ultralytics import YOLO

# Assuming 'model' is your pre-trained model
model.train(data='combined_data.yaml', epochs=30, augment=True, lr0=0.002)

This way, your model continuously learns from both old and new insights, making it smarter and more robust. Keep up the great work, and feel free to reach out for any further questions or clarifications! 🚀

github-actions[bot] commented 6 months ago

👋 Hello there! We wanted to give you a friendly reminder that this issue has not had any recent activity and may be closed soon, but don't worry - you can always reopen it if needed. If you still have any questions or concerns, please feel free to let us know how we can help.

For additional resources and information, please see the links below:

Docs: https://docs.ultralytics.com
HUB: https://hub.ultralytics.com
Community: https://community.ultralytics.com

Feel free to inform us of any other issues you discover or feature requests that come to mind in the future. Pull Requests (PRs) are also always welcomed!

Thank you for your contributions to YOLO 🚀 and Vision AI ⭐

suryargiri commented 5 months ago

I have different folders for training containing 299 images and 85 images for validation, but every time I run YOLOv8, it uses all 85 images from the validation dataset for training. What could be the reason for this, and how can I handle this issue?

Here is my current data.yaml file: path: D:/WORK/broccoli_project/data train: images/train val: images/train test: images/test

nc: 1

classes

names: ["bc"] problem_issues

suryargiri commented 5 months ago

@glenn-jocher please help me to solve above problem.

glenn-jocher commented 5 months ago

Hello @iexactdesign,

It looks like there's a small mistake in your data.yaml file where both the training and validation paths are set to images/train. This is why all your validation images are being used for training. You should update your data.yaml to point to the correct validation folder. Here's the corrected version:

path: D:/WORK/broccoli_project/data
train: images/train
val: images/val  # Updated to the correct validation directory
test: images/test

nc: 1
names: ["bc"]

Make sure your folder structure reflects this change, and you have the images/val directory with your 85 validation images. This should resolve the issue! 😊

suryargiri commented 4 months ago

Dear @glenn-jocher, I am really happy with your response but still I am having the same issues even I change the directory's name. Can you please kindly review the screenshot I attached here. I tried both in Linux and Windows using Python and CLI but got the same number of images. I am supposed to train 299 images from training folder but every time the algorithm catches the images from the validation folder. error

glenn-jocher commented 4 months ago

Hi @iexactdesign,

Thank you for your patience. I reviewed the screenshot you attached. It appears that the validation images are still being used for training. Let's ensure a few things:

Directory Structure: Verify that your directory structure matches the paths specified in your data.yaml file. Ensure that images/train contains your 299 training images and images/val contains your 85 validation images.

Correct data.yaml: Double-check that your data.yaml file is correctly pointing to the respective directories. It should look like this:

path: D:/WORK/broccoli_project/data
train: images/train
val: images/val  # Ensure this points to the validation directory
test: images/test

nc: 1
names: ["bc"]

File Paths: Ensure there are no typos or incorrect paths in your data.yaml file. Sometimes, even a small typo can cause issues.
Cache: Clear any cached data that might be causing the issue. You can do this by deleting the *.cache files in your dataset directories.

If everything is set correctly and the issue persists, please share the exact command you are using to run the training, and any additional logs or error messages you see. This will help us diagnose the issue more effectively.

Looking forward to resolving this for you! 😊

suryargiri commented 4 months ago

Hello @glenn-jocher, I tried various ways but also followed your advice. Still I am having the same issues. What could be the reason I am not able to train with the images from training folder? Please see the screen for entire structure and let me know if you could resolve this fore me. If you need more info please mail me at kavreli@yahoo.ca. Would be great if I could solve this.

glen_question

glenn-jocher commented 4 months ago

Hi @iexactdesign,

Thank you for sharing the screenshot and additional details. Let's try a few more steps to resolve this:

Directory Structure: Ensure your directory structure matches exactly as specified in your data.yaml file. It should look like this:

D:/WORK/broccoli_project/data/
├── images/
│   ├── train/
│   │   ├── image1.jpg
│   │   ├── image2.jpg
│   │   └── ... (299 images)
│   ├── val/
│   │   ├── image1.jpg
│   │   ├── image2.jpg
│   │   └── ... (85 images)
│   └── test/

Correct data.yaml: Ensure your data.yaml file is correctly pointing to the respective directories:

path: D:/WORK/broccoli_project/data
train: images/train
val: images/val
test: images/test

nc: 1
names: ["bc"]

Cache Files: Delete any *.cache files in your dataset directories to ensure no old cache is causing the issue.

Command: Use the following command to start training:

yolo train data=D:/WORK/broccoli_project/data/data.yaml model=yolov8n.pt epochs=50

If the issue persists, please share any error messages or logs you see. This will help us diagnose the problem more effectively. 😊

Looking forward to resolving this for you!

suryargiri commented 4 months ago

@glenn-jocher The problem is not solved. I am not sure if this is the problem with my computer or I am stupid enough not to understand. Can you please kindly review the following logs?

Ultralytics YOLOv8.2.26 ðŸš€ Python-3.12.3 torch-2.3.0 CUDA:0 (NVIDIA RTX 3000 Ada Generation Laptop GPU, 8188MiB) engine\trainer: task=detect, mode=train, model=yolov8n.yaml, data=D:\WORK\broccoli_project\data\datasets.yaml, epochs=10, time=None, patience=100, batch=16, imgsz=640, save=True, save_period=-1, cache=False, device=None, workers=8, project=None, name=train10, exist_ok=False, pretrained=True, optimizer=auto, verbose=True, seed=0, deterministic=True, single_cls=False, rect=False, cos_lr=False, close_mosaic=10, resume=False, amp=True, fraction=1.0, profile=False, freeze=None, multi_scale=False, overlap_mask=True, mask_ratio=4, dropout=0.0, val=True, split=val, save_json=False, save_hybrid=False, conf=None, iou=0.7, max_det=300, half=False, dnn=False, plots=True, source=None, vid_stride=1, stream_buffer=False, visualize=False, augment=False, agnostic_nms=False, classes=None, retina_masks=False, embed=None, show=False, save_frames=False, save_txt=False, save_conf=False, save_crop=False, show_labels=True, show_conf=True, show_boxes=True, line_width=None, format=torchscript, keras=False, optimize=False, int8=False, dynamic=False, simplify=False, opset=None, workspace=4, nms=False, lr0=0.01, lrf=0.01, momentum=0.937, weight_decay=0.0005, warmup_epochs=3.0, warmup_momentum=0.8, warmup_bias_lr=0.1, box=7.5, cls=0.5, dfl=1.5, pose=12.0, kobj=1.0, label_smoothing=0.0, nbs=64, hsv_h=0.015, hsv_s=0.7, hsv_v=0.4, degrees=0.0, translate=0.1, scale=0.5, shear=0.0, perspective=0.0, flipud=0.0, fliplr=0.5, bgr=0.0, mosaic=1.0, mixup=0.0, copy_paste=0.0, auto_augment=randaugment, erasing=0.4, crop_fraction=1.0, cfg=None, tracker=botsort.yaml, save_dir=runs\detect\train10 Overriding model.yaml nc=80 with nc=1

               from  n    params  module                                       arguments

0 -1 1 1 -1 1 2 -1 1 3 -1 1 4 -1 2 5 -1 1 6 7 8 9 10 -1 1 11 [-1, 6] 1 12 13 -1 1 14 [-1, 4] 1 15 16 17 [-1, 12] 1 18 19 20 [-1, 9] 1 21 22 [15, 18, 21] 1 YOLOv8n summary: 464 ultralytics.nn.modules.conv.Conv [3, 16, 3, 2]
4672 ultralytics.nn.modules.conv.Conv [16, 32, 3, 2]
7360 ultralytics.nn.modules.block.C2f [32, 32, 1, True]
18560 ultralytics.nn.modules.conv.Conv [32, 64, 3, 2]
49664 ultralytics.nn.modules.block.C2f [64, 64, 2, True]
73984 ultralytics.nn.modules.conv.Conv [64, 128, 3, 2]
-1 2 197632 ultralytics.nn.modules.block.C2f [128, 128, 2, True]
-1 1 295424 ultralytics.nn.modules.conv.Conv [128, 256, 3, 2]
-1 1 460288 ultralytics.nn.modules.block.C2f [256, 256, 1, True]
-1 1 164608 ultralytics.nn.modules.block.SPPF [256, 256, 5]
0 torch.nn.modules.upsampling.Upsample [None, 2, 'nearest']
0 ultralytics.nn.modules.conv.Concat [1]
-1 1 148224 ultralytics.nn.modules.block.C2f [384, 128, 1]
0 torch.nn.modules.upsampling.Upsample [None, 2, 'nearest']
0 ultralytics.nn.modules.conv.Concat [1]
-1 1 37248 ultralytics.nn.modules.block.C2f [192, 64, 1]
-1 1 36992 ultralytics.nn.modules.conv.Conv [64, 64, 3, 2]
0 ultralytics.nn.modules.conv.Concat [1]
-1 1 123648 ultralytics.nn.modules.block.C2f [192, 128, 1]
-1 1 147712 ultralytics.nn.modules.conv.Conv [128, 128, 3, 2]
0 ultralytics.nn.modules.conv.Concat [1]
-1 1 493056 ultralytics.nn.modules.block.C2f [384, 256, 1]
751507 ultralytics.nn.modules.head.Detect [1, [64, 128, 256]]
225 layers, 3011043 parameters, 3011027 gradients, 8.2 GFLOPs

Freezing layer 'model.22.dfl.conv.weight' AMP: running Automatic Mixed Precision (AMP) checks with YOLOv8n... AMP: checks passed âœ… train: New cache created: D:\WORK\broccoli_project\data\labels\train.cache val: New cache created: D:\WORK\broccoli_project\data\labels\val.cache Plotting labels to runs\detect\train10\labels.jpg... optimizer: 'optimizer=auto' found, ignoring 'lr0=0.01' and 'momentum=0.937' and determining best 'optimizer', 'lr0' and 'momentum' automatically... optimizer: AdamW(lr=0.002, momentum=0.9) with parameter groups 57 weight(decay=0.0), 64 weight(decay=0.0005), 63 bias(decay=0.0) Image sizes 640 train, 640 val Using 8 dataloader workers Logging results to runs\detect\train10 Starting training for 10 epochs... Closing dataloader mosaic

  Epoch    GPU_mem   box_loss   cls_loss   dfl_loss  Instances       Size
               all         85        161          0          0          0          0

  Epoch    GPU_mem   box_loss   cls_loss   dfl_loss  Instances       Size
               all         85        161          0          0          0          0

  Epoch    GPU_mem   box_loss   cls_loss   dfl_loss  Instances       Size
               all         85        161          0          0          0          0

  Epoch    GPU_mem   box_loss   cls_loss   dfl_loss  Instances       Size
               all         85        161          0          0          0          0

  Epoch    GPU_mem   box_loss   cls_loss   dfl_loss  Instances       Size
               all         85        161          0          0          0          0

  Epoch    GPU_mem   box_loss   cls_loss   dfl_loss  Instances       Size
               all         85        161    0.00184      0.292     0.0038   0.000973

  Epoch    GPU_mem   box_loss   cls_loss   dfl_loss  Instances       Size
               all         85        161    0.00298      0.472      0.101     0.0426

  Epoch    GPU_mem   box_loss   cls_loss   dfl_loss  Instances       Size
               all         85        161      0.736       0.19      0.325      0.159

  Epoch    GPU_mem   box_loss   cls_loss   dfl_loss  Instances       Size
               all         85        161      0.722      0.478      0.599      0.323

  Epoch    GPU_mem   box_loss   cls_loss   dfl_loss  Instances       Size
               all         85        161      0.798      0.614      0.717      0.399

10 epochs completed in 0.015 hours. Optimizer stripped from runs\detect\train10\weights\last.pt, 6.2MB Optimizer stripped from runs\detect\train10\weights\best.pt, 6.2MB

Validating runs\detect\train10\weights\best.pt... Ultralytics YOLOv8.2.26 ðŸš€ Python-3.12.3 torch-2.3.0 CUDA:0 (NVIDIA RTX 3000 Ada Generation Laptop GPU, 8188MiB) YOLOv8n summary (fused): 168 layers, 3005843 parameters, 0 gradients, 8.1 GFLOPs all 85 161 0.799 0.617 0.717 0.399 Speed: 0.2ms preprocess, 1.4ms inference, 0.0ms loss, 2.3ms postprocess per image Results saved to runs\detect\train10

suryargiri commented 4 months ago

@glenn-jocher, this is what I got running on PyCharm. As I mentioned previously, there is 299 images in train folder but its capturing the images from the val folder. C:\Users\Surya.Giri\PycharmProjects\broccoli_det_may.venv\Scripts\python.exe C:\Users\Surya.Giri\PycharmProjects\broccoli_det_may\main.py New https://pypi.org/project/ultralytics/8.2.26 available 😃 Update with 'pip install -U ultralytics' Ultralytics YOLOv8.2.25 🚀 Python-3.12.2 torch-2.3.0+cpu CPU (13th Gen Intel Core(TM) i7-13700H) engine\trainer: task=detect, mode=train, model=yolov8n.pt, data=D:\WORK\broccoli_project\data.yaml, epochs=1, time=None, patience=100, batch=16, imgsz=(640, 640), save=True, save_period=-1, cache=False, device=None, workers=8, project=None, name=train27, exist_ok=False, pretrained=True, optimizer=auto, verbose=True, seed=0, deterministic=True, single_cls=False, rect=False, cos_lr=False, close_mosaic=10, resume=False, amp=True, fraction=1.0, profile=False, freeze=None, multi_scale=False, overlap_mask=True, mask_ratio=4, dropout=0.0, val=True, split=val, save_json=False, save_hybrid=False, conf=None, iou=0.7, max_det=300, half=False, dnn=False, plots=True, source=None, vid_stride=1, stream_buffer=False, visualize=False, augment=False, agnostic_nms=False, classes=None, retina_masks=False, embed=None, show=False, save_frames=False, save_txt=False, save_conf=False, save_crop=False, show_labels=True, show_conf=True, show_boxes=True, line_width=None, format=torchscript, keras=False, optimize=False, int8=False, dynamic=False, simplify=False, opset=None, workspace=4, nms=False, lr0=0.01, lrf=0.01, momentum=0.937, weight_decay=0.0005, warmup_epochs=3.0, warmup_momentum=0.8, warmup_bias_lr=0.1, box=7.5, cls=0.5, dfl=1.5, pose=12.0, kobj=1.0, label_smoothing=0.0, nbs=64, hsv_h=0.015, hsv_s=0.7, hsv_v=0.4, degrees=0.0, translate=0.1, scale=0.5, shear=0.0, perspective=0.0, flipud=0.0, fliplr=0.5, bgr=0.0, mosaic=1.0, mixup=0.0, copy_paste=0.0, auto_augment=randaugment, erasing=0.4, crop_fraction=1.0, cfg=None, tracker=botsort.yaml, save_dir=runs\detect\train27 Overriding model.yaml nc=80 with nc=1

               from  n    params  module                                       arguments

0 -1 1 464 ultralytics.nn.modules.conv.Conv [3, 16, 3, 2]
1 -1 1 4672 ultralytics.nn.modules.conv.Conv [16, 32, 3, 2]
2 -1 1 7360 ultralytics.nn.modules.block.C2f [32, 32, 1, True]
3 -1 1 18560 ultralytics.nn.modules.conv.Conv [32, 64, 3, 2]
4 -1 2 49664 ultralytics.nn.modules.block.C2f [64, 64, 2, True]
5 -1 1 73984 ultralytics.nn.modules.conv.Conv [64, 128, 3, 2]
6 -1 2 197632 ultralytics.nn.modules.block.C2f [128, 128, 2, True]
7 -1 1 295424 ultralytics.nn.modules.conv.Conv [128, 256, 3, 2]
8 -1 1 460288 ultralytics.nn.modules.block.C2f [256, 256, 1, True]
9 -1 1 164608 ultralytics.nn.modules.block.SPPF [256, 256, 5]
10 -1 1 0 torch.nn.modules.upsampling.Upsample [None, 2, 'nearest']
11 [-1, 6] 1 0 ultralytics.nn.modules.conv.Concat [1]
12 -1 1 148224 ultralytics.nn.modules.block.C2f [384, 128, 1]
13 -1 1 0 torch.nn.modules.upsampling.Upsample [None, 2, 'nearest']
14 [-1, 4] 1 0 ultralytics.nn.modules.conv.Concat [1]
15 -1 1 37248 ultralytics.nn.modules.block.C2f [192, 64, 1]
16 -1 1 36992 ultralytics.nn.modules.conv.Conv [64, 64, 3, 2]
17 [-1, 12] 1 0 ultralytics.nn.modules.conv.Concat [1]
18 -1 1 123648 ultralytics.nn.modules.block.C2f [192, 128, 1]
19 -1 1 147712 ultralytics.nn.modules.conv.Conv [128, 128, 3, 2]
20 [-1, 9] 1 0 ultralytics.nn.modules.conv.Concat [1]
21 -1 1 493056 ultralytics.nn.modules.block.C2f [384, 256, 1]
22 [15, 18, 21] 1 751507 ultralytics.nn.modules.head.Detect [1, [64, 128, 256]]
Model summary: 225 layers, 3011043 parameters, 3011027 gradients

Transferred 319/355 items from pretrained weights Freezing layer 'model.22.dfl.conv.weight' WARNING ⚠️ updating to 'imgsz=640'. 'train' and 'val' imgsz must be an integer, while 'predict' and 'export' imgsz may be a [h, w] list or an integer, i.e. 'yolo export imgsz=640,480' or 'yolo export imgsz=640' train: Scanning D:\WORK\broccoli_project\train\labels.cache... 299 images, 0 backgrounds, 0 corrupt: 100%|██████████| 299/299 [00:00<?, ?it/s] val: Scanning D:\WORK\broccoli_project\val\labels.cache... 85 images, 0 backgrounds, 0 corrupt: 100%|██████████| 85/85 [00:00<?, ?it/s] Plotting labels to runs\detect\train27\labels.jpg... optimizer: 'optimizer=auto' found, ignoring 'lr0=0.01' and 'momentum=0.937' and determining best 'optimizer', 'lr0' and 'momentum' automatically... 0%| | 0/19 [00:00<?, ?it/s]optimizer: AdamW(lr=0.002, momentum=0.9) with parameter groups 57 weight(decay=0.0), 64 weight(decay=0.0005), 63 bias(decay=0.0) Image sizes 640 train, 640 val Using 0 dataloader workers Logging results to runs\detect\train27 Starting training for 1 epochs...

  Epoch    GPU_mem   box_loss   cls_loss   dfl_loss  Instances       Size
    1/1         0G      1.293      3.178      1.204         41        640: 100%|██████████| 19/19 [02:48<00:00,  8.86s/it]
             Class     Images  Instances      Box(P          R      mAP50  mAP50-95): 100%|██████████| 3/3 [00:21<00:00,  7.06s/it]
               all         85        161    0.00631          1     0.0168     0.0131

1 epochs completed in 0.053 hours. Optimizer stripped from runs\detect\train27\weights\last.pt, 6.2MB Optimizer stripped from runs\detect\train27\weights\best.pt, 6.2MB

Validating runs\detect\train27\weights\best.pt... Ultralytics YOLOv8.2.25 🚀 Python-3.12.2 torch-2.3.0+cpu CPU (13th Gen Intel Core(TM) i7-13700H) Model summary (fused): 168 layers, 3005843 parameters, 0 gradients

glenn-jocher commented 4 months ago

Hello @iexactdesign,

Thank you for sharing the detailed logs. It seems like your setup and commands are correct, and the model is recognizing the correct number of images in the training and validation datasets as per the logs.

If the issue of training images being incorrectly labeled as validation images persists, it might be helpful to check the following:

Label Files: Ensure that the label files in the train and val directories correctly correspond to the images they are supposed to represent. Sometimes, a mix-up here can cause the type of issue you're experiencing.
Data Loading: Verify that the data loading process within your training script is correctly partitioning the images according to the train and val splits defined in your data.yaml. This can sometimes be overlooked if custom data loading scripts are used.
Cache Files: Try deleting the .cache files in your dataset directory and rerun the training process. This forces the dataset to be rescanned and can sometimes resolve issues stemming from corrupted or outdated cache files.
Configuration Check: Double-check that no other scripts or configurations are overriding the paths set in your data.yaml. This can happen in complex projects where multiple scripts might be interacting.

If you've checked all the above and the issue remains, it might be useful to isolate the problem by creating a minimal reproducible example or by stepping through the data loading process with a debugger to see exactly where the data is being misclassified.

Thank you for your patience, and looking forward to getting this resolved! 😊

ultralytics / ultralytics

Training strategy when the dataset receives a stream of new data. #9019

Search before asking

Question

Additional

classes