yolov8 Sparse Transfer Learning

Describe the bug Recipe errors, missing variables and GMPPruningModifier was not recognized. I used the following recipe zoo:cv/detection/yolov8-m/pytorch/ultralytics/coco/pruned75_quant-none I had to add the following variables to the general hyp and I'm not sure what are they responsible for, if you can provide a brief information about each of the variable so we can tune the training process according to our need.

qat_start_epoch: 0.0
quantization_epochs: 0.0

Expected behavior The model should be successfully sparsified. Environment

OS : Ubuntu 22.04
Python: 3.10.6
SparseML: 1.5.0
torch 1.12.0+cu102
Other Python package versions
- deepsparse 1.4.2
- nvidia-cusparse-cu11 11.7.4.91
- sparseml 1.5.0
- sparsezoo 1.5.1
- sparsezoo-nightly 1.5.0.20230509

To Reproduce Exact steps to reproduce the behavior:

pip install sparseml[torch,torchvision,ultralytics]
sparseml.ultralytics.train \
  --model "/home/experement/Desktop/projects/detection/yolo8/runs/detect/cars_yolo8s/weights/best.pt" \
  --recipe zoo:cv/detection/yolov8-s/pytorch/ultralytics/coco/pruned50-quant.yaml \
  --data "/home/experement/Desktop/projects/detection/datasets/yolo8_version/super_cars/data.yaml" \
  --batch -1 \
  --patience 0

download and edit the recipe adding

qat_start_epoch: 0.0
quantization_epochs: 0.0

Errors output:


sparseml.ultralytics.train  --model "/home/experement/Desktop/projects/detection/yolo8/runs/detect/cars_yolo8s/weights/best.pt"  --recipe "/home/experement/Desktop/projects/detection/yolo8/Recipies/yolov8-s-coco-pruned50_quantized.md"  --data "/home/experement/Desktop/projects/detection/datasets/yolo8_version/super_cars/data.yaml"  --batch -1  --patience 0
Ultralytics YOLOv8.0.30 🚀 Python-3.10.6 torch-1.12.0+cu102 CUDA:0 (NVIDIA GeForce GTX 1080, 8114MiB)
yolo/engine/trainer: recipe=/home/experement/Desktop/projects/detection/yolo8/Recipies/yolov8-s-coco-pruned50_quantized.md, recipe_args=None, task=detect, mode=train, model=/home/experement/Desktop/projects/detection/yolo8/runs/detect/cars_yolo8s/weights/best.pt, data=/home/experement/Desktop/projects/detection/datasets/yolo8_version/Aerial Images.v3i.yolov8/data.yaml, epochs=100, patience=0, batch=-1, imgsz=640, save=True, save_period=-1, cache=False, device=None, workers=8, project=None, name=None, exist_ok=False, pretrained=False, optimizer=SGD, verbose=False, seed=0, deterministic=True, single_cls=False, image_weights=False, rect=False, cos_lr=False, close_mosaic=10, resume=False, min_memory=False, overlap_mask=True, mask_ratio=4, dropout=0.0, val=True, split=val, save_json=False, save_hybrid=False, conf=None, iou=0.7, max_det=300, half=False, dnn=False, plots=True, source=None, show=False, save_txt=False, save_conf=False, save_crop=False, hide_labels=False, hide_conf=False, vid_stride=1, line_thickness=3, visualize=False, augment=False, agnostic_nms=False, classes=None, retina_masks=False, boxes=True, format=torchscript, keras=False, optimize=False, int8=False, dynamic=False, simplify=False, opset=None, workspace=4, nms=False, lr0=0.01, lrf=0.01, momentum=0.937, weight_decay=0.0005, warmup_epochs=3.0, warmup_momentum=0.8, warmup_bias_lr=0.1, box=7.5, cls=0.5, dfl=1.5, fl_gamma=0.0, label_smoothing=0.0, nbs=64.0, hsv_h=0.015, hsv_s=0.7, hsv_v=0.4, degrees=0.0, translate=0.1, scale=0.5, shear=0.0, perspective=0.0, flipud=0.0, fliplr=0.5, mosaic=1.0, mixup=0.0, copy_paste=0.0, cfg=None, v5loader=False, save_dir=runs/detect/train26

                   from  n    params  module                                       arguments                     
  0                  -1  1       928  ultralytics.nn.modules.Conv                  [3, 32, 3, 2]                 
  1                  -1  1     18560  ultralytics.nn.modules.Conv                  [32, 64, 3, 2]                
  2                  -1  1     29056  ultralytics.nn.modules.C2f                   [64, 64, 1, True]             
  3                  -1  1     73984  ultralytics.nn.modules.Conv                  [64, 128, 3, 2]               
  4                  -1  2    197632  ultralytics.nn.modules.C2f                   [128, 128, 2, True]           
  5                  -1  1    295424  ultralytics.nn.modules.Conv                  [128, 256, 3, 2]              
  6                  -1  2    788480  ultralytics.nn.modules.C2f                   [256, 256, 2, True]           
  7                  -1  1   1180672  ultralytics.nn.modules.Conv                  [256, 512, 3, 2]              
  8                  -1  1   1838080  ultralytics.nn.modules.C2f                   [512, 512, 1, True]           
  9                  -1  1    656896  ultralytics.nn.modules.SPPF                  [512, 512, 5]                 
 10                  -1  1         0  torch.nn.modules.upsampling.Upsample         [None, 2, 'nearest']          
 11             [-1, 6]  1         0  ultralytics.nn.modules.Concat                [1]                           
 12                  -1  1    591360  ultralytics.nn.modules.C2f                   [768, 256, 1]                 
 13                  -1  1         0  torch.nn.modules.upsampling.Upsample         [None, 2, 'nearest']          
 14             [-1, 4]  1         0  ultralytics.nn.modules.Concat                [1]                           
 15                  -1  1    148224  ultralytics.nn.modules.C2f                   [384, 128, 1]                 
 16                  -1  1    147712  ultralytics.nn.modules.Conv                  [128, 128, 3, 2]              
 17            [-1, 12]  1         0  ultralytics.nn.modules.Concat                [1]                           
 18                  -1  1    493056  ultralytics.nn.modules.C2f                   [384, 256, 1]                 
 19                  -1  1    590336  ultralytics.nn.modules.Conv                  [256, 256, 3, 2]              
 20             [-1, 9]  1         0  ultralytics.nn.modules.Concat                [1]                           
 21                  -1  1   1969152  ultralytics.nn.modules.C2f                   [768, 512, 1]                 
 22        [15, 18, 21]  1   2116435  ultralytics.nn.modules.Detect                [1, [128, 256, 512]]          
Model summary: 225 layers, 11135987 parameters, 11135971 gradients, 28.6 GFLOPs

Transferred 355/355 items from pretrained weights
Received torch.nn.Module, not loading from checkpoint
Traceback (most recent call last):
  File "/home/experement/.local/bin/sparseml.ultralytics.train", line 8, in <module>
    sys.exit(main())
  File "/usr/lib/python3/dist-packages/click/core.py", line 1128, in __call__
    return self.main(*args, **kwargs)
  File "/usr/lib/python3/dist-packages/click/core.py", line 1053, in main
    rv = self.invoke(ctx)
  File "/usr/lib/python3/dist-packages/click/core.py", line 1395, in invoke
    return ctx.invoke(self.callback, **ctx.params)
  File "/usr/lib/python3/dist-packages/click/core.py", line 754, in invoke
    return __callback(*args, **kwargs)
  File "/home/experement/.local/lib/python3.10/site-packages/sparseml/yolov8/train.py", line 225, in main
    model.train(**kwargs)
  File "/home/experement/.local/lib/python3.10/site-packages/sparseml/yolov8/trainers.py", line 796, in train
    self.trainer.train()
  File "/home/experement/.local/lib/python3.10/site-packages/sparseml/yolov8/trainers.py", line 164, in train
    self._do_train(int(os.getenv("RANK", -1)), world_size)
  File "/home/experement/.local/lib/python3.10/site-packages/ultralytics/yolo/engine/trainer.py", line 250, in _do_train
    self._setup_train(rank, world_size)
  File "/home/experement/.local/lib/python3.10/site-packages/sparseml/yolov8/trainers.py", line 278, in _setup_train
    super()._setup_train(rank, world_size)
  File "/home/experement/.local/lib/python3.10/site-packages/ultralytics/yolo/engine/trainer.py", line 201, in _setup_train
    ckpt = self.setup_model()
  File "/home/experement/.local/lib/python3.10/site-packages/sparseml/yolov8/trainers.py", line 170, in setup_model
    self._build_managers(ckpt=None)
  File "/home/experement/.local/lib/python3.10/site-packages/sparseml/yolov8/trainers.py", line 215, in _build_managers
    self.manager = ScheduledModifierManager.from_yaml(
  File "/home/experement/.local/lib/python3.10/site-packages/sparseml/pytorch/optim/manager.py", line 286, in from_yaml
    modifiers = Modifier.load_list(yaml_str)
  File "/home/experement/.local/lib/python3.10/site-packages/sparseml/pytorch/sparsification/modifier.py", line 101, in load_list
    return Modifier.load_framework_list(yaml_str, PYTORCH_FRAMEWORK)
  File "/home/experement/.local/lib/python3.10/site-packages/sparseml/optim/modifier.py", line 318, in load_framework_list
    container = yaml.safe_load(yaml_str)
  File "/usr/lib/python3/dist-packages/yaml/__init__.py", line 162, in safe_load
    return load(stream, SafeLoader)
  File "/usr/lib/python3/dist-packages/yaml/__init__.py", line 114, in load
    return loader.get_single_data()
  File "/usr/lib/python3/dist-packages/yaml/constructor.py", line 51, in get_single_data
    return self.construct_document(node)
  File "/usr/lib/python3/dist-packages/yaml/constructor.py", line 60, in construct_document
    for dummy in generator:
  File "/usr/lib/python3/dist-packages/yaml/constructor.py", line 408, in construct_yaml_seq
    data.extend(self.construct_sequence(node))
  File "/usr/lib/python3/dist-packages/yaml/constructor.py", line 129, in construct_sequence
    return [self.construct_object(child, deep=deep)
  File "/usr/lib/python3/dist-packages/yaml/constructor.py", line 129, in <listcomp>
    return [self.construct_object(child, deep=deep)
  File "/usr/lib/python3/dist-packages/yaml/constructor.py", line 100, in construct_object
    data = constructor(self, node)
  File "/usr/lib/python3/dist-packages/yaml/constructor.py", line 427, in construct_undefined
    raise ConstructorError(None, None,
yaml.constructor.ConstructorError: could not determine a constructor for the tag '!pytorch.GMPPruningModifier'
  in "<unicode string>", line 8, column 3:
    - !pytorch.GMPPruningModifier
      ^

Hi @salwaghanim thank you for raising the issue - for the error posted at the bottom of your ticket, it looks like there is a typo in the recipe, we will get that fixed.

I noticed the recipe stub at the top is different than the one in the reproduction example - that recipe is for a yolov8m model whereas the checkpoint looks to be for an s model - that may be the reason you are seeing a missing params error

@salwaghanim the recipe should be updated within an hour. you may need to clear cache to get the new one

@bfineran Hello thanks for your support. I have tested many yolo models from both yolov5 and yolov8. the recipe at the top may be one of the previous experiments. I have found a bug and encountered one additional issue. I will test the recipe now and report back. I will also post the bug and fix. and the new issue I found in a new issue.

I've been testing the recipe I set the batch size to -1 at first then to 32 then to 16 then to 8 every time I got the same error after finishing the first epoch ,for all batch sizes. At batch size=8 The gpu memory at epoch 1 was 2 gb there was 5.7 free gb, i killed all other processes and removed my display ports form the gpu to keep a stable free memory size . still got the same error. Iam uploading my dataset and weights file in google collab to test on a different gpu I will report back ASAP

sparseml.ultralytics.train   --model "/home/experement/Desktop/projects/detection/yolo8/runs/detect/cars_yolov8_s/weights/best.pt"   --recipe /home/experement/Desktop/projects/detection/yolo8/Recipies/yolov8-s-coco-pruned50_quantized.md  --data "/home/experement/Desktop/projects/detection/datasets/yolo8_version/Aerial Images.v3i.yolov8/data.yaml"   --batch 8   --patience 0
Ultralytics YOLOv8.0.30 🚀 Python-3.10.6 torch-1.12.0+cu102 CUDA:0 (NVIDIA GeForce GTX 1080, 8114MiB)
yolo/engine/trainer: recipe=/home/experement/Desktop/projects/detection/yolo8/Recipies/yolov8-s-coco-pruned50_quantized.md, recipe_args=None, task=detect, mode=train, model=/home/experement/Desktop/projects/detection/yolo8/runs/detect/cars_yolov8_s/weights/best.pt, data=/home/experement/Desktop/projects/detection/datasets/yolo8_version/Aerial Images.v3i.yolov8/data.yaml, epochs=100, patience=0, batch=8, imgsz=640, save=True, save_period=-1, cache=False, device=None, workers=8, project=None, name=None, exist_ok=False, pretrained=False, optimizer=SGD, verbose=False, seed=0, deterministic=True, single_cls=False, image_weights=False, rect=False, cos_lr=False, close_mosaic=10, resume=False, min_memory=False, overlap_mask=True, mask_ratio=4, dropout=0.0, val=True, split=val, save_json=False, save_hybrid=False, conf=None, iou=0.7, max_det=300, half=False, dnn=False, plots=True, source=None, show=False, save_txt=False, save_conf=False, save_crop=False, hide_labels=False, hide_conf=False, vid_stride=1, line_thickness=3, visualize=False, augment=False, agnostic_nms=False, classes=None, retina_masks=False, boxes=True, format=torchscript, keras=False, optimize=False, int8=False, dynamic=False, simplify=False, opset=None, workspace=4, nms=False, lr0=0.01, lrf=0.01, momentum=0.937, weight_decay=0.0005, warmup_epochs=3.0, warmup_momentum=0.8, warmup_bias_lr=0.1, box=7.5, cls=0.5, dfl=1.5, fl_gamma=0.0, label_smoothing=0.0, nbs=64.0, hsv_h=0.015, hsv_s=0.7, hsv_v=0.4, degrees=0.0, translate=0.1, scale=0.5, shear=0.0, perspective=0.0, flipud=0.0, fliplr=0.5, mosaic=1.0, mixup=0.0, copy_paste=0.0, cfg=None, v5loader=False, save_dir=runs/detect/train20

                   from  n    params  module                                       arguments                     
  0                  -1  1       928  ultralytics.nn.modules.Conv                  [3, 32, 3, 2]                 
  1                  -1  1     18560  ultralytics.nn.modules.Conv                  [32, 64, 3, 2]                
  2                  -1  1     29056  ultralytics.nn.modules.C2f                   [64, 64, 1, True]             
  3                  -1  1     73984  ultralytics.nn.modules.Conv                  [64, 128, 3, 2]               
  4                  -1  2    197632  ultralytics.nn.modules.C2f                   [128, 128, 2, True]           
  5                  -1  1    295424  ultralytics.nn.modules.Conv                  [128, 256, 3, 2]              
  6                  -1  2    788480  ultralytics.nn.modules.C2f                   [256, 256, 2, True]           
  7                  -1  1   1180672  ultralytics.nn.modules.Conv                  [256, 512, 3, 2]              
  8                  -1  1   1838080  ultralytics.nn.modules.C2f                   [512, 512, 1, True]           
  9                  -1  1    656896  ultralytics.nn.modules.SPPF                  [512, 512, 5]                 
 10                  -1  1         0  torch.nn.modules.upsampling.Upsample         [None, 2, 'nearest']          
 11             [-1, 6]  1         0  ultralytics.nn.modules.Concat                [1]                           
 12                  -1  1    591360  ultralytics.nn.modules.C2f                   [768, 256, 1]                 
 13                  -1  1         0  torch.nn.modules.upsampling.Upsample         [None, 2, 'nearest']          
 14             [-1, 4]  1         0  ultralytics.nn.modules.Concat                [1]                           
 15                  -1  1    148224  ultralytics.nn.modules.C2f                   [384, 128, 1]                 
 16                  -1  1    147712  ultralytics.nn.modules.Conv                  [128, 128, 3, 2]              
 17            [-1, 12]  1         0  ultralytics.nn.modules.Concat                [1]                           
 18                  -1  1    493056  ultralytics.nn.modules.C2f                   [384, 256, 1]                 
 19                  -1  1    590336  ultralytics.nn.modules.Conv                  [256, 256, 3, 2]              
 20             [-1, 9]  1         0  ultralytics.nn.modules.Concat                [1]                           
 21                  -1  1   1969152  ultralytics.nn.modules.C2f                   [768, 512, 1]                 
 22        [15, 18, 21]  1   2116435  ultralytics.nn.modules.Detect                [1, [128, 256, 512]]          
Model summary: 225 layers, 11135987 parameters, 11135971 gradients, 28.6 GFLOPs

Transferred 355/355 items from pretrained weights
Received torch.nn.Module, not loading from checkpoint
optimizer: SGD(lr=0.01) with parameter groups 57 weight(decay=0.0), 64 weight(decay=0.0005), 63 bias
train: Scanning /home/experement/Desktop/projects/detection/datasets/yolo8_versi
val: Scanning /home/experement/Desktop/projects/detection/datasets/yolo8_version
/home/experement/.local/lib/python3.10/site-packages/sparseml/yolov8/trainers.py:294: UserWarning: Unable to import wandb for logging
  warnings.warn("Unable to import wandb for logging")
Image sizes 640 train, 640 val
Using 8 dataloader workers
Logging results to runs/detect/train20
Starting training for 50 epochs...

      Epoch    GPU_mem   box_loss   cls_loss   dfl_loss  Instances       Size
       1/50      2.17G      1.168      1.006      1.391          8        640: 1
                 Class     Images  Instances      Box(P          R      mAP50  m
Traceback (most recent call last):
  File "/home/experement/.local/bin/sparseml.ultralytics.train", line 8, in <module>
    sys.exit(main())
  File "/usr/lib/python3/dist-packages/click/core.py", line 1128, in __call__
    return self.main(*args, **kwargs)
  File "/usr/lib/python3/dist-packages/click/core.py", line 1053, in main
    rv = self.invoke(ctx)
  File "/usr/lib/python3/dist-packages/click/core.py", line 1395, in invoke
    return ctx.invoke(self.callback, **ctx.params)
  File "/usr/lib/python3/dist-packages/click/core.py", line 754, in invoke
    return __callback(*args, **kwargs)
  File "/home/experement/.local/lib/python3.10/site-packages/sparseml/yolov8/train.py", line 225, in main
    model.train(**kwargs)
  File "/home/experement/.local/lib/python3.10/site-packages/sparseml/yolov8/trainers.py", line 796, in train
    self.trainer.train()
  File "/home/experement/.local/lib/python3.10/site-packages/sparseml/yolov8/trainers.py", line 164, in train
    self._do_train(int(os.getenv("RANK", -1)), world_size)
  File "/home/experement/.local/lib/python3.10/site-packages/ultralytics/yolo/engine/trainer.py", line 344, in _do_train
    self.metrics, self.fitness = self.validate()
  File "/home/experement/.local/lib/python3.10/site-packages/ultralytics/yolo/engine/trainer.py", line 439, in validate
    metrics = self.validator(self)
  File "/home/experement/.local/lib/python3.10/site-packages/sparseml/yolov8/validators.py", line 132, in __call__
    preds = model(batch["img"])
  File "/home/experement/.local/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1130, in _call_impl
    return forward_call(*input, **kwargs)
  File "/home/experement/.local/lib/python3.10/site-packages/ultralytics/nn/tasks.py", line 198, in forward
    return self._forward_once(x, profile, visualize)  # single-scale inference, train
  File "/home/experement/.local/lib/python3.10/site-packages/ultralytics/nn/tasks.py", line 57, in _forward_once
    x = m(x)  # run
  File "/home/experement/.local/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1130, in _call_impl
    return forward_call(*input, **kwargs)
  File "/home/experement/.local/lib/python3.10/site-packages/ultralytics/nn/modules.py", line 401, in forward
    x[i] = torch.cat((self.cv2[i](x[i]), self.cv3[i](x[i])), 1)
  File "/home/experement/.local/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1130, in _call_impl
    return forward_call(*input, **kwargs)
  File "/home/experement/.local/lib/python3.10/site-packages/torch/nn/modules/container.py", line 139, in forward
    input = module(input)
  File "/home/experement/.local/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1130, in _call_impl
    return forward_call(*input, **kwargs)
  File "/home/experement/.local/lib/python3.10/site-packages/sparseml/yolov8/modules.py", line 37, in forward
    return self.act(self.bn(self.conv(x)))
  File "/home/experement/.local/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1130, in _call_impl
    return forward_call(*input, **kwargs)
  File "/home/experement/.local/lib/python3.10/site-packages/torch/nn/modules/activation.py", line 391, in forward
    return F.silu(input, inplace=self.inplace)
  File "/home/experement/.local/lib/python3.10/site-packages/torch/nn/functional.py", line 2048, in silu
    return torch._C._nn.silu(input)
RuntimeError: CUDA out of memory. Tried to allocate 20.00 MiB (GPU 0; 7.92 GiB total capacity; 6.85 GiB already allocated; 4.94 MiB free; 7.29 GiB reserved in total by PyTorch) If reserved memory is >> allocated memory try setting max_split_size_mb to avoid fragmentation.  See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF
(neural_magic) experement@Experement:~$

ok I tested it on google collab same error ps I had to add the following variables to the recipe to make it work

num_epochs: 50
qat_start_epoch: 0.0
quantization_epochs: 0.0

The bug is clearly releated to way your handelling the GPU memory. one worthy mention is that if some error occurred and preveneted the module from starting the trianing process the file loaded will be deleted. once Valid images folder was deleted another time the recipe file was deleted ( I download the recipe file to add the variables needed.) I found two bugs in sparisying yolov5 I solved one of them. I will post them tomorrow.

Ultralytics YOLOv8.0.30 🚀 Python-3.10.12 torch-1.13.1+cu117 CUDA:0 (Tesla T4, 15102MiB)
yolo/engine/trainer: recipe=/content/drive/MyDrive/projects/yolov8/YOLOv8/train_sparse_yolo8/yolov8-s-coco-pruned50_quantized.md, recipe_args=None, task=detect, mode=train, model=/content/drive/MyDrive/projects/yolov8/runs/detect/cars_yolov8_s/weights/best.pt, data=/content/drive/MyDrive/projects/datasets/Aerial_Images.v3i.yolov8/data.yaml, epochs=100, patience=0, batch=-1, imgsz=640, save=True, save_period=-1, cache=False, device=None, workers=8, project=None, name=None, exist_ok=False, pretrained=False, optimizer=AdamW, verbose=False, seed=0, deterministic=True, single_cls=False, image_weights=False, rect=False, cos_lr=False, close_mosaic=10, resume=False, min_memory=False, overlap_mask=True, mask_ratio=4, dropout=0.0, val=True, split=val, save_json=False, save_hybrid=False, conf=None, iou=0.7, max_det=300, half=False, dnn=False, plots=True, source=None, show=False, save_txt=False, save_conf=False, save_crop=False, hide_labels=False, hide_conf=False, vid_stride=1, line_thickness=3, visualize=False, augment=False, agnostic_nms=False, classes=None, retina_masks=False, boxes=True, format=torchscript, keras=False, optimize=False, int8=False, dynamic=False, simplify=False, opset=None, workspace=4, nms=False, lr0=0.01, lrf=0.01, momentum=0.937, weight_decay=0.0005, warmup_epochs=3.0, warmup_momentum=0.8, warmup_bias_lr=0.1, box=7.5, cls=0.5, dfl=1.5, fl_gamma=0.0, label_smoothing=0.0, nbs=64.0, hsv_h=0.015, hsv_s=0.7, hsv_v=0.4, degrees=0.0, translate=0.1, scale=0.5, shear=0.0, perspective=0.0, flipud=0.0, fliplr=0.5, mosaic=1.0, mixup=0.0, copy_paste=0.0, cfg=None, v5loader=False, save_dir=runs/detect/train6

                   from  n    params  module                                       arguments                     
  0                  -1  1       928  ultralytics.nn.modules.Conv                  [3, 32, 3, 2]                 
  1                  -1  1     18560  ultralytics.nn.modules.Conv                  [32, 64, 3, 2]                
  2                  -1  1     29056  ultralytics.nn.modules.C2f                   [64, 64, 1, True]             
  3                  -1  1     73984  ultralytics.nn.modules.Conv                  [64, 128, 3, 2]               
  4                  -1  2    197632  ultralytics.nn.modules.C2f                   [128, 128, 2, True]           
  5                  -1  1    295424  ultralytics.nn.modules.Conv                  [128, 256, 3, 2]              
  6                  -1  2    788480  ultralytics.nn.modules.C2f                   [256, 256, 2, True]           
  7                  -1  1   1180672  ultralytics.nn.modules.Conv                  [256, 512, 3, 2]              
  8                  -1  1   1838080  ultralytics.nn.modules.C2f                   [512, 512, 1, True]           
  9                  -1  1    656896  ultralytics.nn.modules.SPPF                  [512, 512, 5]                 
 10                  -1  1         0  torch.nn.modules.upsampling.Upsample         [None, 2, 'nearest']          
 11             [-1, 6]  1         0  ultralytics.nn.modules.Concat                [1]                           
 12                  -1  1    591360  ultralytics.nn.modules.C2f                   [768, 256, 1]                 
 13                  -1  1         0  torch.nn.modules.upsampling.Upsample         [None, 2, 'nearest']          
 14             [-1, 4]  1         0  ultralytics.nn.modules.Concat                [1]                           
 15                  -1  1    148224  ultralytics.nn.modules.C2f                   [384, 128, 1]                 
 16                  -1  1    147712  ultralytics.nn.modules.Conv                  [128, 128, 3, 2]              
 17            [-1, 12]  1         0  ultralytics.nn.modules.Concat                [1]                           
 18                  -1  1    493056  ultralytics.nn.modules.C2f                   [384, 256, 1]                 
 19                  -1  1    590336  ultralytics.nn.modules.Conv                  [256, 256, 3, 2]              
 20             [-1, 9]  1         0  ultralytics.nn.modules.Concat                [1]                           
 21                  -1  1   1969152  ultralytics.nn.modules.C2f                   [768, 512, 1]                 
 22        [15, 18, 21]  1   2116435  ultralytics.nn.modules.Detect                [1, [128, 256, 512]]          
Model summary: 225 layers, 11135987 parameters, 11135971 gradients, 28.6 GFLOPs

Transferred 355/355 items from pretrained weights
Received torch.nn.Module, not loading from checkpoint
AutoBatch: Computing optimal batch size for imgsz=640
AutoBatch: CUDA:0 (Tesla T4) 14.75G total, 0.09G reserved, 0.08G allocated, 14.58G free
      Params      GFLOPs  GPU_mem (GB)  forward (ms) backward (ms)                   input                  output
    11135987       28.65         0.325         63.77         20.06        (1, 3, 640, 640)                    list
    11135987       57.29         0.547         27.53         23.09        (2, 3, 640, 640)                    list
    11135987       114.6         1.007         25.46         23.89        (4, 3, 640, 640)                    list
    11135987       229.2         1.948         32.16         38.69        (8, 3, 640, 640)                    list
    11135987       458.4         3.590         61.48         80.03       (16, 3, 640, 640)                    list
AutoBatch: Using batch-size 46 for CUDA:0 10.34G/14.75G (70%) ✅
optimizer: AdamW(lr=0.01) with parameter groups 57 weight(decay=0.0), 64 weight(decay=0.000359375), 63 bias
train: Scanning /content/drive/MyDrive/projects/datasets/Aerial_Images.v3i.yolov8/train/labels.cache... 4997 images, 3473 backgrounds, 0 corrupt: 100% 4997/4997 [00:00<?, ?it/s]
albumentations: Blur(p=0.01, blur_limit=(3, 7)), MedianBlur(p=0.01, blur_limit=(3, 7)), ToGray(p=0.01), CLAHE(p=0.01, clip_limit=(1, 4.0), tile_grid_size=(8, 8))
val: Scanning /content/drive/MyDrive/projects/datasets/Aerial_Images.v3i.yolov8/valid/labels.cache... 1441 images, 968 backgrounds, 0 corrupt: 100% 1441/1441 [00:00<?, ?it/s]
/usr/local/lib/python3.10/dist-packages/sparseml/yolov8/trainers.py:294: UserWarning: Unable to import wandb for logging
  warnings.warn("Unable to import wandb for logging")
Image sizes 640 train, 640 val
Using 2 dataloader workers
Logging results to runs/detect/train6
Starting training for 50 epochs...

      Epoch    GPU_mem   box_loss   cls_loss   dfl_loss  Instances       Size
       1/50      11.6G      1.182     0.9192      1.389         37        640: 100% 109/109 [04:37<00:00,  2.55s/it]
                 Class     Images  Instances      Box(P          R      mAP50  mAP50-95):   0% 0/16 [00:03<?, ?it/s]
Traceback (most recent call last):
  File "/usr/local/bin/sparseml.ultralytics.train", line 8, in <module>
    sys.exit(main())
  File "/usr/local/lib/python3.10/dist-packages/click/core.py", line 1130, in __call__
    return self.main(*args, **kwargs)
  File "/usr/local/lib/python3.10/dist-packages/click/core.py", line 1055, in main
    rv = self.invoke(ctx)
  File "/usr/local/lib/python3.10/dist-packages/click/core.py", line 1404, in invoke
    return ctx.invoke(self.callback, **ctx.params)
  File "/usr/local/lib/python3.10/dist-packages/click/core.py", line 760, in invoke
    return __callback(*args, **kwargs)
  File "/usr/local/lib/python3.10/dist-packages/sparseml/yolov8/train.py", line 225, in main
    model.train(**kwargs)
  File "/usr/local/lib/python3.10/dist-packages/sparseml/yolov8/trainers.py", line 796, in train
    self.trainer.train()
  File "/usr/local/lib/python3.10/dist-packages/sparseml/yolov8/trainers.py", line 164, in train
    self._do_train(int(os.getenv("RANK", -1)), world_size)
  File "/usr/local/lib/python3.10/dist-packages/ultralytics/yolo/engine/trainer.py", line 344, in _do_train
    self.metrics, self.fitness = self.validate()
  File "/usr/local/lib/python3.10/dist-packages/ultralytics/yolo/engine/trainer.py", line 439, in validate
    metrics = self.validator(self)
  File "/usr/local/lib/python3.10/dist-packages/sparseml/yolov8/validators.py", line 132, in __call__
    preds = model(batch["img"])
  File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1194, in _call_impl
    return forward_call(*input, **kwargs)
  File "/usr/local/lib/python3.10/dist-packages/ultralytics/nn/tasks.py", line 198, in forward
    return self._forward_once(x, profile, visualize)  # single-scale inference, train
  File "/usr/local/lib/python3.10/dist-packages/ultralytics/nn/tasks.py", line 57, in _forward_once
    x = m(x)  # run
  File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1194, in _call_impl
    return forward_call(*input, **kwargs)
  File "/usr/local/lib/python3.10/dist-packages/ultralytics/nn/modules.py", line 346, in forward
    return torch.cat(x, self.d)
torch.cuda.OutOfMemoryError: CUDA out of memory. Tried to allocate 500.00 MiB (GPU 0; 14.75 GiB total capacity; 13.23 GiB already allocated; 72.81 MiB free; 13.57 GiB reserved in total by PyTorch) If reserved memory is >> allocated memory try setting max_split_size_mb to avoid fragmentation.  See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF

Hi @salwaghanim. Thanks for bringing this to our attention. It turns out that the recipe had some errors, as you noted. We sometimes clean up the recipe before publishing the models, and inadvertently introduce errors in the process. Our sparse YOLOv8 models are going through quality control at the moment, so we should clear out these bugs in a few days.

Regarding the memory error that you are seeing, it seems to be related to the fact that the batch size for validation is double the batch size used for training. Validation happens at the end of every training epoch, and the doubling of the batch size can lead to spikes in memory utilization. This is a feature implemented in the ultralytics repo. We're working on overriding this behavior and should push a fix in the coming days. Until then, the only workaround I can recommend is lowering the batch size even further 😞 .

I also want to point out that this recipe includes quantization towards the end of training (last 6 epochs). It is common for memory utilization to increase when quantization starts, up to 4 four times. This happens because when quantization starts we need to disable mixed precision. In the future we will implement changes to automatically reduce the batch size during quantization to avoid memory issues, but that is still not supported. So my recommendation is to experiment with a reduced training schedule just to make sure you don't run into memory issues before doing a complete run. You can achieve this by adding an argument such as this one to override the recipe hyper-parameters:

--recipe_args '{"num_epochs":10,"pruning_start_epoch":1,"pruning_end_epoch":3}'

As a reference point, on a RTXA4000 w/ 16GB RAM I need to reduce the batch size to 4 to be able to run the complete recipe.

Finally, I want to bring up that it is possible to train the sparse model directly into your own dataset, without need to sparsify the model from scratch. This is a process we call "sparse transfer". You can find an example here (transferred to VOC): here. This link for sparse transfer followed by quantization will be available shortly.

Hello @salwaghanim Some time has passed between our last comment and hearing back. Noting we had corrected the errors as soon as you discovered them; we greatly appreciate that. I hope the other insight @anmarques has proven valuable. I'd like to go ahead and close out this thread. But if you would like to continue the conversation, please re-open this issue. Thank you so much! Jeannie / Neural Magic

@anmarques Hi, Thanks a lot for the previous explanations, I have a few issues with performing sparse transfer learning for yolov8s model on my custom dataset. I am using the below code to perform the sparse transfer

!sparseml.ultralytics.train \
  --model "zoo:cv/detection/yolov8-s/pytorch/ultralytics/coco/pruned55_quant-none" \
  --recipe "zoo:cv/detection/yolov8-s/pytorch/ultralytics/coco/pruned55_quant-none" \
  --data /content/datasets/Sphero-Robot-detection-8/data.yaml \
  --recipe_args '{"num_epochs":10,"pruning_start_epoch":1,"pruning_end_epoch":3}'  \
  --batch 4

So my issues are
1) the achieved model size is around 87 MB compared to the original size of yolov8s of 21MB (Am i doing something wrong here, as I thought since this involves pruning and quantization, the model should be smaller)

Thanking you in advance

neuralmagic / sparseml

yolov8 Sparse Transfer Learning #1605