ultralytics / ultralytics

NEW - YOLOv8 🚀 in PyTorch > ONNX > OpenVINO > CoreML > TFLite
https://docs.ultralytics.com
GNU Affero General Public License v3.0
23.76k stars 4.74k forks source link

Problems in yolov8 training #11243

Open 1cai2 opened 2 weeks ago

1cai2 commented 2 weeks ago

Search before asking

YOLOv8 Component

Train

Bug

Here's my training code: from ultralytics import YOLO if __name__=="__main__": model = YOLO('G:/python_conde/yolov8/save_to_yolo/ultralytics_three/datasets/yolov8.yaml') results = model.train(data='G:/python_conde/yolov8/save_to_yolo/ultralytics_three/datasets/composite_finger.yaml', epochs=100, imgsz=640,batch=2)

But problems arise during the validation phase of the training process。

Here is the error message: Logging results to runs\detect\train Starting training for 100 epochs...

  Epoch    GPU_mem   box_loss   cls_loss   dfl_loss  Instances       Size
  1/100     0.518G      3.155      4.984      3.948          1        640: 100%|██████████| 539/539 [00:41<00:00, 12.99it/s]
              Class     Images  Instances      Box(P          R      mAP50  mAP50-95):   0%|          | 0/68 [00:00<?, ?it/s]

Traceback (most recent call last): File "G:\python_conde\yolov8\save_to_yolo\ultralytics_three\train_test.py", line 32, in results = model.train(data='G:/python_conde/yolov8/save_to_yolo/ultralytics_three/datasets/composite_finger.yaml', epochs=100, imgsz=640,batch=2) File "G:\python_conde\yolov8\save_to_yolo\ultralytics_three\ultralytics\engine\model.py", line 341, in train self.trainer.train() File "G:\python_conde\yolov8\save_to_yolo\ultralytics_three\ultralytics\engine\trainer.py", line 192, in train self._do_train(world_size) File "G:\python_conde\yolov8\save_to_yolo\ultralytics_three\ultralytics\engine\trainer.py", line 392, in _do_train self.metrics, self.fitness = self.validate() File "G:\python_conde\yolov8\save_to_yolo\ultralytics_three\ultralytics\engine\trainer.py", line 496, in validate metrics = self.validator(self) File "G:\conda_data\envs\yolo\lib\site-packages\torch\autograd\grad_mode.py", line 28, in decorate_context return func(*args, kwargs) File "G:\python_conde\yolov8\save_to_yolo\ultralytics_three\ultralytics\engine\validator.py", line 168, in call preds = model(batch['img'], augment=augment) File "G:\conda_data\envs\yolo\lib\site-packages\torch\nn\modules\module.py", line 1102, in _call_impl return forward_call(*input, *kwargs) File "G:\python_conde\yolov8\save_to_yolo\ultralytics_three\ultralytics\nn\tasks.py", line 42, in forward return self.predict(x, args, kwargs) File "G:\python_conde\yolov8\save_to_yolo\ultralytics_three\ultralytics\nn\tasks.py", line 59, in predict return self._predict_once(x, profile, visualize) File "G:\python_conde\yolov8\save_to_yolo\ultralytics_three\ultralytics\nn\tasks.py", line 79, in _predict_once x = m(x) # run File "G:\conda_data\envs\yolo\lib\site-packages\torch\nn\modules\module.py", line 1102, in _call_impl return forward_call(*input, *kwargs) File "G:\python_conde\yolov8\save_to_yolo\ultralytics_three\ultralytics\nn\modules\conv.py", line 36, in forward return self.act(self.bn(self.conv(x))) File "G:\conda_data\envs\yolo\lib\site-packages\torch\nn\modules\module.py", line 1102, in _call_impl return forward_call(input, **kwargs) File "G:\conda_data\envs\yolo\lib\site-packages\torch\nn\modules\conv.py", line 446, in forward return self._conv_forward(input, self.weight, self.bias) File "G:\conda_data\envs\yolo\lib\site-packages\torch\nn\modules\conv.py", line 442, in _conv_forward return F.conv2d(input, weight, bias, self.stride, RuntimeError: Input type (torch.cuda.HalfTensor) and weight type (torch.cuda.FloatTensor) should be the same

Environment

Ultralytics YOLOv8.0.203 Python-3.8.18 torch-1.10.1 CUDA:0 (NVIDIA GeForce GTX 1650 Ti, 4096MiB)

Minimal Reproducible Example

from ultralytics import YOLO if name=="main": model = YOLO('G:/python_conde/yolov8/save_to_yolo/ultralytics_three/datasets/yolov8.yaml') results = model.train(data='G:/python_conde/yolov8/save_to_yolo/ultralytics_three/datasets/composite_finger.yaml', epochs=100, imgsz=640,batch=2)

Additional

No response

Are you willing to submit a PR?

glenn-jocher commented 2 weeks ago

It looks like you're encountering a type mismatch during the validation phase where the input tensor and weights are of different data types. This is common when you use mixed precision training, as certain tensors might be converted to half-precision (torch.cuda.HalfTensor) while others remain in full precision (torch.cuda.FloatTensor).

You can try ensuring that both your model and inputs are consistent in their data types. You might consider explicitly setting your model to use .float() or .half() before training to handle precision uniformly. Here’s a specific modification you can make at the beginning of your training loop:

from ultralytics import YOLO

if __name__ == "__main__":
    model = YOLO('path/to/your/yolov8.yaml').float()  # Ensure model uses float32
    results = model.train(data='path/to/your/dataset.yaml', epochs=100, imgsz=640, batch=2)

This modification ensures that your model uses 32-bit floating point precision consistently. If you intend to use mixed precision to leverage speedups from .half(), make sure that your input data tensors and your model agree on the data type throughout the training process. If you're still facing issues or need further assistance, please let us know! 😊