`ModuleNotFoundError: No module named 'ultralytics.nn.modules.conv'; 'ultralytics.nn.modules' is not a package`

OttomanZ commented 1 year ago

Describe the bug So, I'm trying to train a YOLOv8 Model using sparseml.ultralytics.train, I've tried and tested both sparseml-nightly[ultralytics] and sparseml[ultralytics] and both of them are giving me the same error on Google Colab. When I upgraded to the latest ultralytics version, it worked fine when training using ultralytics but it breaks the sparseml training. I've tried multiple work arounds including removing all of the version checks but so far that too has failed, I believe that there are two refrences to this bug on Ultraytics GitHub Page dating only 2 weeks back.

Command I'm using:

!sparseml.ultralytics.train \
  --model "./best.pt" \
  --recipe zoo:cv/detection/yolov8-m/pytorch/ultralytics/coco/pruned75-none.yaml \
  --data "./Torus-Segmentation-2/data.yaml" \
  --batch 8 \
  --patience 0

Expected behavior

Should train a without Giving an Error.

Environment Include all relevant environment information:

Google Colab Notebook To Reproduce Exact steps to reproduce the behavior:

!pip install sparseml[ultralytics]
!sparseml.ultralytics.train \
  --model "./best.pt" \
  --recipe zoo:cv/detection/yolov8-m/pytorch/ultralytics/coco/pruned75-none.yaml \
  --data "./Torus-Segmentation-2/data.yaml" \
  --batch 8 \
  --patience 0

This error is happening in the sparseml[ultralytics]==1.5, the latest PyPI release as of now.

Full Traceback

  File "/usr/local/bin/sparseml.ultralytics.train", line 8, in <module>
    sys.exit(main())
  File "/usr/local/lib/python3.10/dist-packages/click/core.py", line 1128, in __call__
    return self.main(*args, **kwargs)
  File "/usr/local/lib/python3.10/dist-packages/click/core.py", line 1053, in main
    rv = self.invoke(ctx)
  File "/usr/local/lib/python3.10/dist-packages/click/core.py", line 1395, in invoke
    return ctx.invoke(self.callback, **ctx.params)
  File "/usr/local/lib/python3.10/dist-packages/click/core.py", line 754, in invoke
    return __callback(*args, **kwargs)
  File "/usr/local/lib/python3.10/dist-packages/sparseml/yolov8/train.py", line 224, in main
    model = SparseYOLO(kwargs["model"])
  File "/usr/local/lib/python3.10/dist-packages/sparseml/yolov8/trainers.py", line 548, in __init__
    ckpt = torch.load(model_str)
  File "/usr/local/lib/python3.10/dist-packages/torch/serialization.py", line 789, in load
    return _load(opened_zipfile, map_location, pickle_module, **pickle_load_args)
  File "/usr/local/lib/python3.10/dist-packages/torch/serialization.py", line 1131, in _load
    result = unpickler.load()
  File "/usr/local/lib/python3.10/dist-packages/torch/serialization.py", line 1124, in find_class
    return super().find_class(mod_name, name)
ModuleNotFoundError: No module named 'ultralytics.nn.modules.conv'; 'ultralytics.nn.modules' is not a package

Additional context Add any other context about the problem here. Also include any relevant files.

salwaghanim commented 1 year ago

Facing same issue here, I have tested it on both collab and on my local machine. I have tried to remove sparseml and tested on sparseml-nightly 1.6.0.20230608

pip uninstall sparseml
pip install sparseml-nightly[ultralytics]
sparseml.ultralytics.train \
--model "/home/experement/Desktop/projects/detection/yolo8/runs/detect/cars/weights/best.pt" \
--recipe /home/experement/Desktop/projects/detection/yolo8/Recipies/yolov8-s-coco-pruned65_quantized.md\
--data "/home/experement/Desktop/projects/detection/datasets/yolo8_version/super_cars/data.yaml" \
--batch -1 \
--patience 0 \
--optimizer AdamW

OttomanZ commented 1 year ago

@salwaghanim I faced the same error when I tried to train using ultralytics module without using sparseml. But when you do pip install --upgrade ultralytics then it will fix the issue in ultralytics but since the latest version isn't compatible w/ sparseml it's not going to work even if you remove all of the version checks.

OttomanZ commented 1 year ago

@dfneuralmagic Hi, Team at Neural Magic! Can you guys take a look into this, and the sparseml.ultralytics.train up and running w/ the latest ultralytics module.

salwaghanim commented 1 year ago

@OttomanZ Here is a temporary fix I tested on yolov5, google collab if you test it on other recopies or on yolov8 share the details plz.

!pip uninstall tensorflow -y
!pip install sparseml==1.4.4
!pip install sparseml[dev,torch,torchvision,deepsparse,onnxruntime,transformers,yolov5]
!sparseml.yolov5.train --help #this will download additional libraries
!sparseml.yolov5.train \
  --cfg yolov5m.yaml \
  --weights /content/drive/MyDrive/projects/yolov5/runs/train/super_cars_yolov5m/weights/best.pt\
  --recipe zoo:cv/detection/yolov5-m/pytorch/ultralytics/coco/pruned70_quant-none?recipe_type=transfer_learn \
  --data /content/drive/MyDrive/projects/datasets/cars_moidels/data.yaml \
  --patience 0 \
  --hyp /content/drive/MyDrive/projects/yolov5/runs/train/super_cars_yolov5m/hyp.yaml \
  --optimizer AdamW

To operate on yolov8 you need to edit installing the required libraries. it may be yolov8 instead of 5 it was like that early on

!pip install sparseml[dev,torch,torchvision,deepsparse,onnxruntime,transformers,yolov5]
!sparseml.yolov5.train --help #this will download additional libraries

I noticed that you may have miniconda on your local machine since when you installed the newer version of sparseml it messed up your other environment the same thing happen with me so train the models normally then install libraries and apply the scarification procedure.

dbogunowicz commented 1 year ago

Thanks for surfacing that, will look into it.

dbogunowicz commented 1 year ago

@OttomanZ or @salwaghanim could you share your weights? So either:

/content/drive/MyDrive/projects/yolov5/runs/train/super_cars_yolov5m/weights/best.pt @OttomanZ or /home/experement/Desktop/projects/detection/yolo8/runs/detect/cars/weights/best.pt @salwaghanim

How do you guys create those files?

salwaghanim commented 1 year ago

@dbogunowicz Hello this is a detailed procedure.

Install the official yolo its a straight forward process pip install ultralytics a. Note: make sure that you have installed cuda and its drivers before installing ultralytics.
Select the model size appropiat for your needs.
train your model and repeat the process with different hyper parameters ( use batch -1, start from pretrained coco not from scratch, use ADAMW optimizer, it converges faster than SGD and they both maxed at the same results when working with 1 up to 5 classes. when you reach acceptable results save that weight file and use it here as a base weight for transfer learning process. this code is to demonstrate the repetitive process. save your settings and results I use pytorch Lightning.

from ultralytics import YOLO

model =  YOLO("yolov8s.pt")
# model = YOLO(r'/home/experement/Desktop/projects/detection/yolo8/runs/detect/cars_models_adamw/weights/last.pt')

# model = YOLO(r'/home/experement/Desktop/projects/detection/yolo8/runs/detect/cars_alone_Small_RMSProp3/weights/last.pt')

# data_path = r'/home/experement/Desktop/projects/detection/dataset/Cars_models/data.yaml'
data_path = r'/home/experement/Desktop/projects/detection/datasets/yolo8_version/Toyota_cars_Images.v2i.yolov8/data.yaml'

model.train(data=data_path,optimizer = 'AdamW',batch=-1 ,name="Toyota_cars_Images_v2i",epochs=1000, imgsz=640,save=True,pretrained=True) #current best

# model.train(data=r'/home/experement/Desktop/projects/detection/dataset/yolo8_version/Toyota Images.v3i.yolov8/data.yaml',optimizer = 'RMSProp',batch=-1 ,name="cars_alone_Small_RMSProp",epochs=1000, imgsz=640,save=True,save_period=50,pretrained=True) #train with RMSProp

# model.train(data=r'/home/experement/Desktop/projects/detection/dataset/yolo8_version/Toyota Images.v3i.yolov8/data.yaml',optimizer = 'SGD',batch=-1 ,name="cars_alone_Small_RMSProp",epochs=1000, imgsz=640,save=True,save_period=50,pretrained=True,resume =True) #train with sgd reached the best the same perfomance as ADAMW but it took *3 time

# model.train(data=r'/home/experement/Desktop/projects/detection/dataset/yolo8_version/Toyota Images.v3i.yolov8/data.yaml',optimizer = 'AdamW',batch=-1 ,name="cars_alone_Small",epochs=1000, imgsz=640,save=True,save_period=50,pretrained=True) #current best and least time
#sparseml.ultralytics.train --task=detect --mode=train --model=/home/experement/Desktop/projects/detection/yolo8/preTrained_models/yolov8s.pt --data='/home/experement/Desktop/projects/detection/dataset/yolo8_version/Toyota Images.v3i.yolov8/data.yaml' --epochs=1000 --imgsz=640 --save=True --save_period=50 --optimizer=AdamW --pretrained=True

salwaghanim commented 1 year ago

@dbogunowicz I noticed that you released sparseml 1.5.1 I tested the model on google collab and still got the same error here is the installation procedure and

!pip uninstall tensorflow -y

Found existing installation: tensorflow 2.12.0
Uninstalling tensorflow-2.12.0:
  Successfully uninstalled tensorflow-2.12.0

!pip install sparseml[torch,torchvision,ultralytics]
Looking in indexes: https://pypi.org/simple, https://us-python.pkg.dev/colab-wheels/public/simple/
Collecting sparseml[torch,torchvision,ultralytics]
  Downloading sparseml-1.5.1-py3-none-any.whl (945 kB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 945.6/945.6 kB 50.4 MB/s eta 0:00:00
Collecting sparsezoo~=1.5.0 (from sparseml[torch,torchvision,ultralytics])
  Downloading sparsezoo-1.5.1-py3-none-any.whl (131 kB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 131.7/131.7 kB 12.0 MB/s eta 0:00:00
Collecting setuptools<=59.5.0 (from sparseml[torch,torchvision,ultralytics])
  Downloading setuptools-59.5.0-py3-none-any.whl (952 kB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 952.4/952.4 kB 74.8 MB/s eta 0:00:00
Collecting jupyter>=1.0.0 (from sparseml[torch,torchvision,ultralytics])
  Downloading jupyter-1.0.0-py2.py3-none-any.whl (2.7 kB)
Requirement already satisfied: ipywidgets>=7.0.0 in /usr/local/lib/python3.10/dist-packages (from sparseml[torch,torchvision,ultralytics]) (7.7.1)
Requirement already satisfied: pyyaml>=5.0.0 in /usr/local/lib/python3.10/dist-packages (from sparseml[torch,torchvision,ultralytics]) (6.0)
Requirement already satisfied: progressbar2>=3.0.0 in /usr/local/lib/python3.10/dist-packages (from sparseml[torch,torchvision,ultralytics]) (4.2.0)
Collecting numpy<=1.21.6,>=1.0.0 (from sparseml[torch,torchvision,ultralytics])
  Downloading numpy-1.21.6-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (15.9 MB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 15.9/15.9 MB 65.5 MB/s eta 0:00:00
Requirement already satisfied: matplotlib>=3.0.0 in /usr/local/lib/python3.10/dist-packages (from sparseml[torch,torchvision,ultralytics]) (3.7.1)
Collecting merge-args>=0.1.0 (from sparseml[torch,torchvision,ultralytics])
  Downloading merge_args-0.1.5-py2.py3-none-any.whl (6.0 kB)
Collecting onnx<=1.12.0,>=1.5.0 (from sparseml[torch,torchvision,ultralytics])
  Downloading onnx-1.12.0-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (13.1 MB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 13.1/13.1 MB 56.6 MB/s eta 0:00:00
Requirement already satisfied: pandas>=0.25.0 in /usr/local/lib/python3.10/dist-packages (from sparseml[torch,torchvision,ultralytics]) (1.5.3)
Requirement already satisfied: packaging>=20.0 in /usr/local/lib/python3.10/dist-packages (from sparseml[torch,torchvision,ultralytics]) (23.1)
Requirement already satisfied: psutil>=5.0.0 in /usr/local/lib/python3.10/dist-packages (from sparseml[torch,torchvision,ultralytics]) (5.9.5)
Requirement already satisfied: pydantic>=1.5.0 in /usr/local/lib/python3.10/dist-packages (from sparseml[torch,torchvision,ultralytics]) (1.10.7)
Requirement already satisfied: requests>=2.0.0 in /usr/local/lib/python3.10/dist-packages (from sparseml[torch,torchvision,ultralytics]) (2.27.1)
Requirement already satisfied: scikit-image>=0.15.0 in /usr/local/lib/python3.10/dist-packages (from sparseml[torch,torchvision,ultralytics]) (0.19.3)
Requirement already satisfied: scikit-learn>=0.24.2 in /usr/local/lib/python3.10/dist-packages (from sparseml[torch,torchvision,ultralytics]) (1.2.2)
Requirement already satisfied: scipy>=1.0.0 in /usr/local/lib/python3.10/dist-packages (from sparseml[torch,torchvision,ultralytics]) (1.10.1)
Requirement already satisfied: tqdm>=4.0.0 in /usr/local/lib/python3.10/dist-packages (from sparseml[torch,torchvision,ultralytics]) (4.65.0)
Collecting toposort>=1.0 (from sparseml[torch,torchvision,ultralytics])
  Downloading toposort-1.10-py3-none-any.whl (8.5 kB)
Collecting GPUtil>=1.4.0 (from sparseml[torch,torchvision,ultralytics])
  Downloading GPUtil-1.4.0.tar.gz (5.5 kB)
  Preparing metadata (setup.py) ... done
Collecting protobuf<=3.20.1,>=3.12.2 (from sparseml[torch,torchvision,ultralytics])
  Downloading protobuf-3.20.1-cp310-cp310-manylinux_2_12_x86_64.manylinux2010_x86_64.whl (1.1 MB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 1.1/1.1 MB 73.8 MB/s eta 0:00:00
Requirement already satisfied: click!=8.0.0,>=7.1.2 in /usr/local/lib/python3.10/dist-packages (from sparseml[torch,torchvision,ultralytics]) (8.1.3)
Collecting ultralytics==8.0.30 (from sparseml[torch,torchvision,ultralytics])
  Downloading ultralytics-8.0.30-py3-none-any.whl (272 kB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 272.9/272.9 kB 34.0 MB/s eta 0:00:00
Collecting torch<1.14,>=1.7.0 (from sparseml[torch,torchvision,ultralytics])
  Downloading torch-1.13.1-cp310-cp310-manylinux1_x86_64.whl (887.5 MB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 887.5/887.5 MB 1.0 MB/s eta 0:00:00
Collecting gputils (from sparseml[torch,torchvision,ultralytics])
  Downloading gputils-1.0.6-py3-none-any.whl (3.8 kB)
Collecting torchvision<0.15,>=0.3.0 (from sparseml[torch,torchvision,ultralytics])
  Downloading torchvision-0.14.1-cp310-cp310-manylinux1_x86_64.whl (24.2 MB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 24.2/24.2 MB 69.1 MB/s eta 0:00:00
Collecting opencv-python<=4.6.0.66 (from sparseml[torch,torchvision,ultralytics])
  Downloading opencv_python-4.6.0.66-cp36-abi3-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (60.9 MB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 60.9/60.9 MB 9.3 MB/s eta 0:00:00
Requirement already satisfied: Pillow>=7.1.2 in /usr/local/lib/python3.10/dist-packages (from ultralytics==8.0.30->sparseml[torch,torchvision,ultralytics]) (8.4.0)
Requirement already satisfied: tensorboard>=2.4.1 in /usr/local/lib/python3.10/dist-packages (from ultralytics==8.0.30->sparseml[torch,torchvision,ultralytics]) (2.12.2)
Requirement already satisfied: seaborn>=0.11.0 in /usr/local/lib/python3.10/dist-packages (from ultralytics==8.0.30->sparseml[torch,torchvision,ultralytics]) (0.12.2)
Requirement already satisfied: ipython in /usr/local/lib/python3.10/dist-packages (from ultralytics==8.0.30->sparseml[torch,torchvision,ultralytics]) (7.34.0)
Collecting thop>=0.1.1 (from ultralytics==8.0.30->sparseml[torch,torchvision,ultralytics])
  Downloading thop-0.1.1.post2209072238-py3-none-any.whl (15 kB)
Collecting sentry-sdk (from ultralytics==8.0.30->sparseml[torch,torchvision,ultralytics])
  Downloading sentry_sdk-1.25.1-py2.py3-none-any.whl (206 kB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 206.7/206.7 kB 16.8 MB/s eta 0:00:00
Requirement already satisfied: ipykernel>=4.5.1 in /usr/local/lib/python3.10/dist-packages (from ipywidgets>=7.0.0->sparseml[torch,torchvision,ultralytics]) (5.5.6)
Requirement already satisfied: ipython-genutils~=0.2.0 in /usr/local/lib/python3.10/dist-packages (from ipywidgets>=7.0.0->sparseml[torch,torchvision,ultralytics]) (0.2.0)
Requirement already satisfied: traitlets>=4.3.1 in /usr/local/lib/python3.10/dist-packages (from ipywidgets>=7.0.0->sparseml[torch,torchvision,ultralytics]) (5.7.1)
Requirement already satisfied: widgetsnbextension~=3.6.0 in /usr/local/lib/python3.10/dist-packages (from ipywidgets>=7.0.0->sparseml[torch,torchvision,ultralytics]) (3.6.4)
Requirement already satisfied: jupyterlab-widgets>=1.0.0 in /usr/local/lib/python3.10/dist-packages (from ipywidgets>=7.0.0->sparseml[torch,torchvision,ultralytics]) (3.0.7)
Requirement already satisfied: notebook in /usr/local/lib/python3.10/dist-packages (from jupyter>=1.0.0->sparseml[torch,torchvision,ultralytics]) (6.4.8)
Collecting qtconsole (from jupyter>=1.0.0->sparseml[torch,torchvision,ultralytics])
  Downloading qtconsole-5.4.3-py3-none-any.whl (121 kB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 121.9/121.9 kB 15.7 MB/s eta 0:00:00
Requirement already satisfied: jupyter-console in /usr/local/lib/python3.10/dist-packages (from jupyter>=1.0.0->sparseml[torch,torchvision,ultralytics]) (6.1.0)
Requirement already satisfied: nbconvert in /usr/local/lib/python3.10/dist-packages (from jupyter>=1.0.0->sparseml[torch,torchvision,ultralytics]) (6.5.4)
Requirement already satisfied: contourpy>=1.0.1 in /usr/local/lib/python3.10/dist-packages (from matplotlib>=3.0.0->sparseml[torch,torchvision,ultralytics]) (1.0.7)
Requirement already satisfied: cycler>=0.10 in /usr/local/lib/python3.10/dist-packages (from matplotlib>=3.0.0->sparseml[torch,torchvision,ultralytics]) (0.11.0)
Requirement already satisfied: fonttools>=4.22.0 in /usr/local/lib/python3.10/dist-packages (from matplotlib>=3.0.0->sparseml[torch,torchvision,ultralytics]) (4.39.3)
Requirement already satisfied: kiwisolver>=1.0.1 in /usr/local/lib/python3.10/dist-packages (from matplotlib>=3.0.0->sparseml[torch,torchvision,ultralytics]) (1.4.4)
Requirement already satisfied: pyparsing>=2.3.1 in /usr/local/lib/python3.10/dist-packages (from matplotlib>=3.0.0->sparseml[torch,torchvision,ultralytics]) (3.0.9)
Requirement already satisfied: python-dateutil>=2.7 in /usr/local/lib/python3.10/dist-packages (from matplotlib>=3.0.0->sparseml[torch,torchvision,ultralytics]) (2.8.2)
Requirement already satisfied: typing-extensions>=3.6.2.1 in /usr/local/lib/python3.10/dist-packages (from onnx<=1.12.0,>=1.5.0->sparseml[torch,torchvision,ultralytics]) (4.5.0)
Requirement already satisfied: pytz>=2020.1 in /usr/local/lib/python3.10/dist-packages (from pandas>=0.25.0->sparseml[torch,torchvision,ultralytics]) (2022.7.1)
Requirement already satisfied: python-utils>=3.0.0 in /usr/local/lib/python3.10/dist-packages (from progressbar2>=3.0.0->sparseml[torch,torchvision,ultralytics]) (3.5.2)
Requirement already satisfied: urllib3<1.27,>=1.21.1 in /usr/local/lib/python3.10/dist-packages (from requests>=2.0.0->sparseml[torch,torchvision,ultralytics]) (1.26.15)
Requirement already satisfied: certifi>=2017.4.17 in /usr/local/lib/python3.10/dist-packages (from requests>=2.0.0->sparseml[torch,torchvision,ultralytics]) (2022.12.7)
Requirement already satisfied: charset-normalizer~=2.0.0 in /usr/local/lib/python3.10/dist-packages (from requests>=2.0.0->sparseml[torch,torchvision,ultralytics]) (2.0.12)
Requirement already satisfied: idna<4,>=2.5 in /usr/local/lib/python3.10/dist-packages (from requests>=2.0.0->sparseml[torch,torchvision,ultralytics]) (3.4)
Requirement already satisfied: networkx>=2.2 in /usr/local/lib/python3.10/dist-packages (from scikit-image>=0.15.0->sparseml[torch,torchvision,ultralytics]) (3.1)
Requirement already satisfied: imageio>=2.4.1 in /usr/local/lib/python3.10/dist-packages (from scikit-image>=0.15.0->sparseml[torch,torchvision,ultralytics]) (2.25.1)
Requirement already satisfied: tifffile>=2019.7.26 in /usr/local/lib/python3.10/dist-packages (from scikit-image>=0.15.0->sparseml[torch,torchvision,ultralytics]) (2023.4.12)
Requirement already satisfied: PyWavelets>=1.1.1 in /usr/local/lib/python3.10/dist-packages (from scikit-image>=0.15.0->sparseml[torch,torchvision,ultralytics]) (1.4.1)
Requirement already satisfied: joblib>=1.1.1 in /usr/local/lib/python3.10/dist-packages (from scikit-learn>=0.24.2->sparseml[torch,torchvision,ultralytics]) (1.2.0)
Requirement already satisfied: threadpoolctl>=2.0.0 in /usr/local/lib/python3.10/dist-packages (from scikit-learn>=0.24.2->sparseml[torch,torchvision,ultralytics]) (3.1.0)
Collecting py-machineid>=0.3.0 (from sparsezoo~=1.5.0->sparseml[torch,torchvision,ultralytics])
  Downloading py_machineid-0.3.0-py3-none-any.whl (4.0 kB)
Collecting geocoder>=1.38.0 (from sparsezoo~=1.5.0->sparseml[torch,torchvision,ultralytics])
  Downloading geocoder-1.38.1-py2.py3-none-any.whl (98 kB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 98.6/98.6 kB 14.3 MB/s eta 0:00:00
Collecting nvidia-cuda-runtime-cu11==11.7.99 (from torch<1.14,>=1.7.0->sparseml[torch,torchvision,ultralytics])
  Downloading nvidia_cuda_runtime_cu11-11.7.99-py3-none-manylinux1_x86_64.whl (849 kB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 849.3/849.3 kB 72.6 MB/s eta 0:00:00
Collecting nvidia-cudnn-cu11==8.5.0.96 (from torch<1.14,>=1.7.0->sparseml[torch,torchvision,ultralytics])
  Downloading nvidia_cudnn_cu11-8.5.0.96-2-py3-none-manylinux1_x86_64.whl (557.1 MB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 557.1/557.1 MB 3.1 MB/s eta 0:00:00
Collecting nvidia-cublas-cu11==11.10.3.66 (from torch<1.14,>=1.7.0->sparseml[torch,torchvision,ultralytics])
  Downloading nvidia_cublas_cu11-11.10.3.66-py3-none-manylinux1_x86_64.whl (317.1 MB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 317.1/317.1 MB 4.9 MB/s eta 0:00:00
Collecting nvidia-cuda-nvrtc-cu11==11.7.99 (from torch<1.14,>=1.7.0->sparseml[torch,torchvision,ultralytics])
  Downloading nvidia_cuda_nvrtc_cu11-11.7.99-2-py3-none-manylinux1_x86_64.whl (21.0 MB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 21.0/21.0 MB 83.8 MB/s eta 0:00:00
Requirement already satisfied: wheel in /usr/local/lib/python3.10/dist-packages (from nvidia-cublas-cu11==11.10.3.66->torch<1.14,>=1.7.0->sparseml[torch,torchvision,ultralytics]) (0.40.0)
Requirement already satisfied: future in /usr/local/lib/python3.10/dist-packages (from geocoder>=1.38.0->sparsezoo~=1.5.0->sparseml[torch,torchvision,ultralytics]) (0.18.3)
Collecting ratelim (from geocoder>=1.38.0->sparsezoo~=1.5.0->sparseml[torch,torchvision,ultralytics])
  Downloading ratelim-0.1.6-py2.py3-none-any.whl (4.0 kB)
Requirement already satisfied: six in /usr/local/lib/python3.10/dist-packages (from geocoder>=1.38.0->sparsezoo~=1.5.0->sparseml[torch,torchvision,ultralytics]) (1.16.0)
Requirement already satisfied: jupyter-client in /usr/local/lib/python3.10/dist-packages (from ipykernel>=4.5.1->ipywidgets>=7.0.0->sparseml[torch,torchvision,ultralytics]) (6.1.12)
Requirement already satisfied: tornado>=4.2 in /usr/local/lib/python3.10/dist-packages (from ipykernel>=4.5.1->ipywidgets>=7.0.0->sparseml[torch,torchvision,ultralytics]) (6.3.1)
Collecting jedi>=0.16 (from ipython->ultralytics==8.0.30->sparseml[torch,torchvision,ultralytics])
  Downloading jedi-0.18.2-py2.py3-none-any.whl (1.6 MB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 1.6/1.6 MB 58.3 MB/s eta 0:00:00
Requirement already satisfied: decorator in /usr/local/lib/python3.10/dist-packages (from ipython->ultralytics==8.0.30->sparseml[torch,torchvision,ultralytics]) (4.4.2)
Requirement already satisfied: pickleshare in /usr/local/lib/python3.10/dist-packages (from ipython->ultralytics==8.0.30->sparseml[torch,torchvision,ultralytics]) (0.7.5)
Requirement already satisfied: prompt-toolkit!=3.0.0,!=3.0.1,<3.1.0,>=2.0.0 in /usr/local/lib/python3.10/dist-packages (from ipython->ultralytics==8.0.30->sparseml[torch,torchvision,ultralytics]) (3.0.38)
Requirement already satisfied: pygments in /usr/local/lib/python3.10/dist-packages (from ipython->ultralytics==8.0.30->sparseml[torch,torchvision,ultralytics]) (2.14.0)
Requirement already satisfied: backcall in /usr/local/lib/python3.10/dist-packages (from ipython->ultralytics==8.0.30->sparseml[torch,torchvision,ultralytics]) (0.2.0)
Requirement already satisfied: matplotlib-inline in /usr/local/lib/python3.10/dist-packages (from ipython->ultralytics==8.0.30->sparseml[torch,torchvision,ultralytics]) (0.1.6)
Requirement already satisfied: pexpect>4.3 in /usr/local/lib/python3.10/dist-packages (from ipython->ultralytics==8.0.30->sparseml[torch,torchvision,ultralytics]) (4.8.0)
Collecting winregistry (from py-machineid>=0.3.0->sparsezoo~=1.5.0->sparseml[torch,torchvision,ultralytics])
  Downloading winregistry-1.1.1-py3-none-any.whl (5.8 kB)
Requirement already satisfied: absl-py>=0.4 in /usr/local/lib/python3.10/dist-packages (from tensorboard>=2.4.1->ultralytics==8.0.30->sparseml[torch,torchvision,ultralytics]) (1.4.0)
Requirement already satisfied: grpcio>=1.48.2 in /usr/local/lib/python3.10/dist-packages (from tensorboard>=2.4.1->ultralytics==8.0.30->sparseml[torch,torchvision,ultralytics]) (1.54.0)
Requirement already satisfied: google-auth<3,>=1.6.3 in /usr/local/lib/python3.10/dist-packages (from tensorboard>=2.4.1->ultralytics==8.0.30->sparseml[torch,torchvision,ultralytics]) (2.17.3)
Requirement already satisfied: google-auth-oauthlib<1.1,>=0.5 in /usr/local/lib/python3.10/dist-packages (from tensorboard>=2.4.1->ultralytics==8.0.30->sparseml[torch,torchvision,ultralytics]) (1.0.0)
Requirement already satisfied: markdown>=2.6.8 in /usr/local/lib/python3.10/dist-packages (from tensorboard>=2.4.1->ultralytics==8.0.30->sparseml[torch,torchvision,ultralytics]) (3.4.3)
Requirement already satisfied: tensorboard-data-server<0.8.0,>=0.7.0 in /usr/local/lib/python3.10/dist-packages (from tensorboard>=2.4.1->ultralytics==8.0.30->sparseml[torch,torchvision,ultralytics]) (0.7.0)
Requirement already satisfied: tensorboard-plugin-wit>=1.6.0 in /usr/local/lib/python3.10/dist-packages (from tensorboard>=2.4.1->ultralytics==8.0.30->sparseml[torch,torchvision,ultralytics]) (1.8.1)
Requirement already satisfied: werkzeug>=1.0.1 in /usr/local/lib/python3.10/dist-packages (from tensorboard>=2.4.1->ultralytics==8.0.30->sparseml[torch,torchvision,ultralytics]) (2.3.0)
Requirement already satisfied: jinja2 in /usr/local/lib/python3.10/dist-packages (from notebook->jupyter>=1.0.0->sparseml[torch,torchvision,ultralytics]) (3.1.2)
Requirement already satisfied: pyzmq>=17 in /usr/local/lib/python3.10/dist-packages (from notebook->jupyter>=1.0.0->sparseml[torch,torchvision,ultralytics]) (23.2.1)
Requirement already satisfied: argon2-cffi in /usr/local/lib/python3.10/dist-packages (from notebook->jupyter>=1.0.0->sparseml[torch,torchvision,ultralytics]) (21.3.0)
Requirement already satisfied: jupyter-core>=4.6.1 in /usr/local/lib/python3.10/dist-packages (from notebook->jupyter>=1.0.0->sparseml[torch,torchvision,ultralytics]) (5.3.0)
Requirement already satisfied: nbformat in /usr/local/lib/python3.10/dist-packages (from notebook->jupyter>=1.0.0->sparseml[torch,torchvision,ultralytics]) (5.8.0)
Requirement already satisfied: nest-asyncio>=1.5 in /usr/local/lib/python3.10/dist-packages (from notebook->jupyter>=1.0.0->sparseml[torch,torchvision,ultralytics]) (1.5.6)
Requirement already satisfied: Send2Trash>=1.8.0 in /usr/local/lib/python3.10/dist-packages (from notebook->jupyter>=1.0.0->sparseml[torch,torchvision,ultralytics]) (1.8.0)
Requirement already satisfied: terminado>=0.8.3 in /usr/local/lib/python3.10/dist-packages (from notebook->jupyter>=1.0.0->sparseml[torch,torchvision,ultralytics]) (0.17.1)
Requirement already satisfied: prometheus-client in /usr/local/lib/python3.10/dist-packages (from notebook->jupyter>=1.0.0->sparseml[torch,torchvision,ultralytics]) (0.16.0)
Requirement already satisfied: lxml in /usr/local/lib/python3.10/dist-packages (from nbconvert->jupyter>=1.0.0->sparseml[torch,torchvision,ultralytics]) (4.9.2)
Requirement already satisfied: beautifulsoup4 in /usr/local/lib/python3.10/dist-packages (from nbconvert->jupyter>=1.0.0->sparseml[torch,torchvision,ultralytics]) (4.11.2)
Requirement already satisfied: bleach in /usr/local/lib/python3.10/dist-packages (from nbconvert->jupyter>=1.0.0->sparseml[torch,torchvision,ultralytics]) (6.0.0)
Requirement already satisfied: defusedxml in /usr/local/lib/python3.10/dist-packages (from nbconvert->jupyter>=1.0.0->sparseml[torch,torchvision,ultralytics]) (0.7.1)
Requirement already satisfied: entrypoints>=0.2.2 in /usr/local/lib/python3.10/dist-packages (from nbconvert->jupyter>=1.0.0->sparseml[torch,torchvision,ultralytics]) (0.4)
Requirement already satisfied: jupyterlab-pygments in /usr/local/lib/python3.10/dist-packages (from nbconvert->jupyter>=1.0.0->sparseml[torch,torchvision,ultralytics]) (0.2.2)
Requirement already satisfied: MarkupSafe>=2.0 in /usr/local/lib/python3.10/dist-packages (from nbconvert->jupyter>=1.0.0->sparseml[torch,torchvision,ultralytics]) (2.1.2)
Requirement already satisfied: mistune<2,>=0.8.1 in /usr/local/lib/python3.10/dist-packages (from nbconvert->jupyter>=1.0.0->sparseml[torch,torchvision,ultralytics]) (0.8.4)
Requirement already satisfied: nbclient>=0.5.0 in /usr/local/lib/python3.10/dist-packages (from nbconvert->jupyter>=1.0.0->sparseml[torch,torchvision,ultralytics]) (0.7.4)
Requirement already satisfied: pandocfilters>=1.4.1 in /usr/local/lib/python3.10/dist-packages (from nbconvert->jupyter>=1.0.0->sparseml[torch,torchvision,ultralytics]) (1.5.0)
Requirement already satisfied: tinycss2 in /usr/local/lib/python3.10/dist-packages (from nbconvert->jupyter>=1.0.0->sparseml[torch,torchvision,ultralytics]) (1.2.1)
Collecting qtpy>=2.0.1 (from qtconsole->jupyter>=1.0.0->sparseml[torch,torchvision,ultralytics])
  Downloading QtPy-2.3.1-py3-none-any.whl (84 kB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 84.9/84.9 kB 11.7 MB/s eta 0:00:00
Requirement already satisfied: cachetools<6.0,>=2.0.0 in /usr/local/lib/python3.10/dist-packages (from google-auth<3,>=1.6.3->tensorboard>=2.4.1->ultralytics==8.0.30->sparseml[torch,torchvision,ultralytics]) (5.3.0)
Requirement already satisfied: pyasn1-modules>=0.2.1 in /usr/local/lib/python3.10/dist-packages (from google-auth<3,>=1.6.3->tensorboard>=2.4.1->ultralytics==8.0.30->sparseml[torch,torchvision,ultralytics]) (0.3.0)
Requirement already satisfied: rsa<5,>=3.1.4 in /usr/local/lib/python3.10/dist-packages (from google-auth<3,>=1.6.3->tensorboard>=2.4.1->ultralytics==8.0.30->sparseml[torch,torchvision,ultralytics]) (4.9)
Requirement already satisfied: requests-oauthlib>=0.7.0 in /usr/local/lib/python3.10/dist-packages (from google-auth-oauthlib<1.1,>=0.5->tensorboard>=2.4.1->ultralytics==8.0.30->sparseml[torch,torchvision,ultralytics]) (1.3.1)
Requirement already satisfied: parso<0.9.0,>=0.8.0 in /usr/local/lib/python3.10/dist-packages (from jedi>=0.16->ipython->ultralytics==8.0.30->sparseml[torch,torchvision,ultralytics]) (0.8.3)
Requirement already satisfied: platformdirs>=2.5 in /usr/local/lib/python3.10/dist-packages (from jupyter-core>=4.6.1->notebook->jupyter>=1.0.0->sparseml[torch,torchvision,ultralytics]) (3.3.0)
Requirement already satisfied: fastjsonschema in /usr/local/lib/python3.10/dist-packages (from nbformat->notebook->jupyter>=1.0.0->sparseml[torch,torchvision,ultralytics]) (2.16.3)
Requirement already satisfied: jsonschema>=2.6 in /usr/local/lib/python3.10/dist-packages (from nbformat->notebook->jupyter>=1.0.0->sparseml[torch,torchvision,ultralytics]) (4.3.3)
Requirement already satisfied: ptyprocess>=0.5 in /usr/local/lib/python3.10/dist-packages (from pexpect>4.3->ipython->ultralytics==8.0.30->sparseml[torch,torchvision,ultralytics]) (0.7.0)
Requirement already satisfied: wcwidth in /usr/local/lib/python3.10/dist-packages (from prompt-toolkit!=3.0.0,!=3.0.1,<3.1.0,>=2.0.0->ipython->ultralytics==8.0.30->sparseml[torch,torchvision,ultralytics]) (0.2.6)
Requirement already satisfied: argon2-cffi-bindings in /usr/local/lib/python3.10/dist-packages (from argon2-cffi->notebook->jupyter>=1.0.0->sparseml[torch,torchvision,ultralytics]) (21.2.0)
Requirement already satisfied: soupsieve>1.2 in /usr/local/lib/python3.10/dist-packages (from beautifulsoup4->nbconvert->jupyter>=1.0.0->sparseml[torch,torchvision,ultralytics]) (2.4.1)
Requirement already satisfied: webencodings in /usr/local/lib/python3.10/dist-packages (from bleach->nbconvert->jupyter>=1.0.0->sparseml[torch,torchvision,ultralytics]) (0.5.1)
Requirement already satisfied: attrs>=17.4.0 in /usr/local/lib/python3.10/dist-packages (from jsonschema>=2.6->nbformat->notebook->jupyter>=1.0.0->sparseml[torch,torchvision,ultralytics]) (23.1.0)
Requirement already satisfied: pyrsistent!=0.17.0,!=0.17.1,!=0.17.2,>=0.14.0 in /usr/local/lib/python3.10/dist-packages (from jsonschema>=2.6->nbformat->notebook->jupyter>=1.0.0->sparseml[torch,torchvision,ultralytics]) (0.19.3)
Requirement already satisfied: pyasn1<0.6.0,>=0.4.6 in /usr/local/lib/python3.10/dist-packages (from pyasn1-modules>=0.2.1->google-auth<3,>=1.6.3->tensorboard>=2.4.1->ultralytics==8.0.30->sparseml[torch,torchvision,ultralytics]) (0.5.0)
Requirement already satisfied: oauthlib>=3.0.0 in /usr/local/lib/python3.10/dist-packages (from requests-oauthlib>=0.7.0->google-auth-oauthlib<1.1,>=0.5->tensorboard>=2.4.1->ultralytics==8.0.30->sparseml[torch,torchvision,ultralytics]) (3.2.2)
Requirement already satisfied: cffi>=1.0.1 in /usr/local/lib/python3.10/dist-packages (from argon2-cffi-bindings->argon2-cffi->notebook->jupyter>=1.0.0->sparseml[torch,torchvision,ultralytics]) (1.15.1)
Requirement already satisfied: pycparser in /usr/local/lib/python3.10/dist-packages (from cffi>=1.0.1->argon2-cffi-bindings->argon2-cffi->notebook->jupyter>=1.0.0->sparseml[torch,torchvision,ultralytics]) (2.21)
Building wheels for collected packages: GPUtil
  Building wheel for GPUtil (setup.py) ... done
  Created wheel for GPUtil: filename=GPUtil-1.4.0-py3-none-any.whl size=7393 sha256=df8ad9a3a744ef76876c71a98fd882348a6ec649aa9a01cd443df448016a2229
  Stored in directory: /root/.cache/pip/wheels/a9/8a/bd/81082387151853ab8b6b3ef33426e98f5cbfebc3c397a9d4d0
Successfully built GPUtil
Installing collected packages: toposort, merge-args, GPUtil, winregistry, setuptools, sentry-sdk, ratelim, qtpy, protobuf, nvidia-cuda-nvrtc-cu11, numpy, jedi, py-machineid, opencv-python, onnx, nvidia-cuda-runtime-cu11, nvidia-cublas-cu11, geocoder, sparsezoo, nvidia-cudnn-cu11, gputils, torch, qtconsole, torchvision, thop, ultralytics, jupyter, sparseml
  Attempting uninstall: setuptools
    Found existing installation: setuptools 67.7.2
    Uninstalling setuptools-67.7.2:
      Successfully uninstalled setuptools-67.7.2
  Attempting uninstall: protobuf
    Found existing installation: protobuf 3.20.3
    Uninstalling protobuf-3.20.3:
      Successfully uninstalled protobuf-3.20.3
  Attempting uninstall: numpy
    Found existing installation: numpy 1.22.4
    Uninstalling numpy-1.22.4:
      Successfully uninstalled numpy-1.22.4
  Attempting uninstall: opencv-python
    Found existing installation: opencv-python 4.7.0.72
    Uninstalling opencv-python-4.7.0.72:
      Successfully uninstalled opencv-python-4.7.0.72
  Attempting uninstall: torch
    Found existing installation: torch 2.0.1+cu118
    Uninstalling torch-2.0.1+cu118:
      Successfully uninstalled torch-2.0.1+cu118
  Attempting uninstall: torchvision
    Found existing installation: torchvision 0.15.2+cu118
    Uninstalling torchvision-0.15.2+cu118:
      Successfully uninstalled torchvision-0.15.2+cu118
ERROR: pip's dependency resolver does not currently take into account all the packages that are installed. This behaviour is the source of the following dependency conflicts.
dopamine-rl 4.0.6 requires tensorflow>=2.2.0, which is not installed.
arviz 0.15.1 requires setuptools>=60.0.0, but you have setuptools 59.5.0 which is incompatible.
cvxpy 1.3.1 requires setuptools>65.5.1, but you have setuptools 59.5.0 which is incompatible.
google-api-core 2.11.0 requires protobuf!=3.20.0,!=3.20.1,!=4.21.0,!=4.21.1,!=4.21.2,!=4.21.3,!=4.21.4,!=4.21.5,<5.0.0dev,>=3.19.5, but you have protobuf 3.20.1 which is incompatible.
google-cloud-bigquery 3.9.0 requires protobuf!=3.20.0,!=3.20.1,!=4.21.0,!=4.21.1,!=4.21.2,!=4.21.3,!=4.21.4,!=4.21.5,<5.0.0dev,>=3.19.5, but you have protobuf 3.20.1 which is incompatible.
google-cloud-bigquery-storage 2.19.1 requires protobuf!=3.20.0,!=3.20.1,!=4.21.0,!=4.21.1,!=4.21.2,!=4.21.3,!=4.21.4,!=4.21.5,<5.0.0dev,>=3.19.5, but you have protobuf 3.20.1 which is incompatible.
google-cloud-datastore 2.15.1 requires protobuf!=3.20.0,!=3.20.1,!=4.21.0,!=4.21.1,!=4.21.2,!=4.21.3,!=4.21.4,!=4.21.5,<5.0.0dev,>=3.19.5, but you have protobuf 3.20.1 which is incompatible.
google-cloud-firestore 2.11.0 requires protobuf!=3.20.0,!=3.20.1,!=4.21.0,!=4.21.1,!=4.21.2,!=4.21.3,!=4.21.4,!=4.21.5,<5.0.0dev,>=3.19.5, but you have protobuf 3.20.1 which is incompatible.
google-cloud-language 2.9.1 requires protobuf!=3.20.0,!=3.20.1,!=4.21.0,!=4.21.1,!=4.21.2,!=4.21.3,!=4.21.4,!=4.21.5,<5.0.0dev,>=3.19.5, but you have protobuf 3.20.1 which is incompatible.
google-cloud-translate 3.11.1 requires protobuf!=3.20.0,!=3.20.1,!=4.21.0,!=4.21.1,!=4.21.2,!=4.21.3,!=4.21.4,!=4.21.5,<5.0.0dev,>=3.19.5, but you have protobuf 3.20.1 which is incompatible.
googleapis-common-protos 1.59.0 requires protobuf!=3.20.0,!=3.20.1,!=4.21.1,!=4.21.2,!=4.21.3,!=4.21.4,!=4.21.5,<5.0.0dev,>=3.19.5, but you have protobuf 3.20.1 which is incompatible.
tensorflow-metadata 1.13.1 requires protobuf<5,>=3.20.3, but you have protobuf 3.20.1 which is incompatible.
torchaudio 2.0.2+cu118 requires torch==2.0.1, but you have torch 1.13.1 which is incompatible.
torchdata 0.6.1 requires torch==2.0.1, but you have torch 1.13.1 which is incompatible.
torchtext 0.15.2 requires torch==2.0.1, but you have torch 1.13.1 which is incompatible.
Successfully installed GPUtil-1.4.0 geocoder-1.38.1 gputils-1.0.6 jedi-0.18.2 jupyter-1.0.0 merge-args-0.1.5 numpy-1.21.6 nvidia-cublas-cu11-11.10.3.66 nvidia-cuda-nvrtc-cu11-11.7.99 nvidia-cuda-runtime-cu11-11.7.99 nvidia-cudnn-cu11-8.5.0.96 onnx-1.12.0 opencv-python-4.6.0.66 protobuf-3.20.1 py-machineid-0.3.0 qtconsole-5.4.3 qtpy-2.3.1 ratelim-0.1.6 sentry-sdk-1.25.1 setuptools-59.5.0 sparseml-1.5.1 sparsezoo-1.5.1 thop-0.1.1.post2209072238 toposort-1.10 torch-1.13.1 torchvision-0.14.1 ultralytics-8.0.30 winregistry-1.1.1
WARNING: The following packages were previously imported in this runtime:
  [_distutils_hack,google,numpy,pkg_resources,setuptools]
You must restart the runtime in order to use newly installed versions.

%cd /content/drive/MyDrive/projects/yolov8

!sparseml.ultralytics.train \
  --model "/content/drive/MyDrive/projects/yolov8/runs/detect/cars_alone_Small/weights/best.pt" \
  --recipe zoo:cv/detection/yolov8-s/pytorch/ultralytics/coco/pruned50_quant-none\
  --data "/content/drive/MyDrive/projects/datasets/Toyota_cars_Images.v3i.yolov8/data.yaml" \
  --batch -1 \
  --patience 0 \
  --optimizer AdamW

output message

/content/drive/MyDrive/projects/yolov8
Ultralytics YOLOv8.0.30 🚀 Python-3.10.12 torch-1.13.1+cu117 CUDA:0 (Tesla T4, 15102MiB)
yolo/engine/trainer: recipe=zoo:cv/detection/yolov8-s/pytorch/ultralytics/coco/pruned50_quant-none, recipe_args=None, datasets_dir=None, task=detect, mode=train, model=/content/drive/MyDrive/projects/yolov8/runs/detect/cars_alone_Small/weights/best.pt, data=/content/drive/MyDrive/projects/datasets/Toyota_Images.v3i.yolov8/data.yaml, epochs=100, patience=0, batch=-1, imgsz=640, save=True, save_period=-1, cache=False, device=None, workers=8, project=None, name=None, exist_ok=False, pretrained=False, optimizer=AdamW, verbose=False, seed=0, deterministic=True, single_cls=False, image_weights=False, rect=False, cos_lr=False, close_mosaic=10, resume=False, min_memory=False, overlap_mask=True, mask_ratio=4, dropout=0.0, val=True, split=val, save_json=False, save_hybrid=False, conf=None, iou=0.7, max_det=300, half=False, dnn=False, plots=True, source=None, show=False, save_txt=False, save_conf=False, save_crop=False, hide_labels=False, hide_conf=False, vid_stride=1, line_thickness=3, visualize=False, augment=False, agnostic_nms=False, classes=None, retina_masks=False, boxes=True, format=torchscript, keras=False, optimize=False, int8=False, dynamic=False, simplify=False, opset=None, workspace=4, nms=False, lr0=0.01, lrf=0.01, momentum=0.937, weight_decay=0.0005, warmup_epochs=3.0, warmup_momentum=0.8, warmup_bias_lr=0.1, box=7.5, cls=0.5, dfl=1.5, fl_gamma=0.0, label_smoothing=0.0, nbs=64.0, hsv_h=0.015, hsv_s=0.7, hsv_v=0.4, degrees=0.0, translate=0.1, scale=0.5, shear=0.0, perspective=0.0, flipud=0.0, fliplr=0.5, mosaic=1.0, mixup=0.0, copy_paste=0.0, cfg=None, v5loader=False, save_dir=runs/detect/train
Downloading https://ultralytics.com/assets/Arial.ttf to /root/.config/Ultralytics/Arial.ttf...
100% 755k/755k [00:00<00:00, 92.6MB/s]

                   from  n    params  module                                       arguments                     
  0                  -1  1       928  ultralytics.nn.modules.Conv                  [3, 32, 3, 2]                 
  1                  -1  1     18560  ultralytics.nn.modules.Conv                  [32, 64, 3, 2]                
  2                  -1  1     29056  ultralytics.nn.modules.C2f                   [64, 64, 1, True]             
  3                  -1  1     73984  ultralytics.nn.modules.Conv                  [64, 128, 3, 2]               
  4                  -1  2    197632  ultralytics.nn.modules.C2f                   [128, 128, 2, True]           
  5                  -1  1    295424  ultralytics.nn.modules.Conv                  [128, 256, 3, 2]              
  6                  -1  2    788480  ultralytics.nn.modules.C2f                   [256, 256, 2, True]           
  7                  -1  1   1180672  ultralytics.nn.modules.Conv                  [256, 512, 3, 2]              
  8                  -1  1   1838080  ultralytics.nn.modules.C2f                   [512, 512, 1, True]           
  9                  -1  1    656896  ultralytics.nn.modules.SPPF                  [512, 512, 5]                 
 10                  -1  1         0  torch.nn.modules.upsampling.Upsample         [None, 2, 'nearest']          
 11             [-1, 6]  1         0  ultralytics.nn.modules.Concat                [1]                           
 12                  -1  1    591360  ultralytics.nn.modules.C2f                   [768, 256, 1]                 
 13                  -1  1         0  torch.nn.modules.upsampling.Upsample         [None, 2, 'nearest']          
 14             [-1, 4]  1         0  ultralytics.nn.modules.Concat                [1]                           
 15                  -1  1    148224  ultralytics.nn.modules.C2f                   [384, 128, 1]                 
 16                  -1  1    147712  ultralytics.nn.modules.Conv                  [128, 128, 3, 2]              
 17            [-1, 12]  1         0  ultralytics.nn.modules.Concat                [1]                           
 18                  -1  1    493056  ultralytics.nn.modules.C2f                   [384, 256, 1]                 
 19                  -1  1    590336  ultralytics.nn.modules.Conv                  [256, 256, 3, 2]              
 20             [-1, 9]  1         0  ultralytics.nn.modules.Concat                [1]                           
 21                  -1  1   1969152  ultralytics.nn.modules.C2f                   [768, 512, 1]                 
 22        [15, 18, 21]  1   2116435  ultralytics.nn.modules.Detect                [1, [128, 256, 512]]          
Model summary: 225 layers, 11135987 parameters, 11135971 gradients, 28.6 GFLOPs

Transferred 355/355 items from pretrained weights
Received torch.nn.Module, not loading from checkpoint
downloading...: 100% 12.1k/12.1k [00:00<00:00, 6.42MB/s]
AutoBatch: Computing optimal batch size for imgsz=640
AutoBatch: CUDA:0 (Tesla T4) 14.75G total, 0.09G reserved, 0.08G allocated, 14.58G free
      Params      GFLOPs  GPU_mem (GB)  forward (ms) backward (ms)                   input                  output
    11135987       28.65         0.325         134.8         24.01        (1, 3, 640, 640)                    list
    11135987       57.29         0.547         24.12         19.28        (2, 3, 640, 640)                    list
    11135987       114.6         1.007         24.19         23.46        (4, 3, 640, 640)                    list
    11135987       229.2         1.948         31.62         39.09        (8, 3, 640, 640)                    list
    11135987       458.4         3.590         60.86         79.45       (16, 3, 640, 640)                    list
AutoBatch: Using batch-size 46 for CUDA:0 10.34G/14.75G (70%) ✅
optimizer: AdamW(lr=0.01) with parameter groups 57 weight(decay=0.0), 64 weight(decay=0.000359375), 63 bias
train: Scanning /content/drive/MyDrive/projects/datasets/Toyota_cars_Images.v3i.yolov8/train/labels.cache... 4997 images, 3473 backgrounds, 0 corrupt: 100% 4997/4997 [00:00<?, ?it/s]
albumentations: Blur(p=0.01, blur_limit=(3, 7)), MedianBlur(p=0.01, blur_limit=(3, 7)), ToGray(p=0.01), CLAHE(p=0.01, clip_limit=(1, 4.0), tile_grid_size=(8, 8))
val: Scanning /content/drive/MyDrive/projects/datasets/Toyota_cars_Images.v3i.yolov8/valid/labels.cache... 1441 images, 968 backgrounds, 0 corrupt: 100% 1441/1441 [00:00<?, ?it/s]
/usr/local/lib/python3.10/dist-packages/sparseml/yolov8/trainers.py:294: UserWarning: Unable to import wandb for logging
  warnings.warn("Unable to import wandb for logging")
Image sizes 640 train, 640 val
Using 2 dataloader workers
Logging results to runs/detect/train
Starting training for 56 epochs...

      Epoch    GPU_mem   box_loss   cls_loss   dfl_loss  Instances       Size
       1/56      11.1G      1.177      0.913      1.385         37        640: 100% 109/109 [06:35<00:00,  3.63s/it]
                 Class     Images  Instances      Box(P          R      mAP50  mAP50-95):   0% 0/16 [01:24<?, ?it/s]
Traceback (most recent call last):
  File "/usr/local/bin/sparseml.ultralytics.train", line 8, in <module>
    sys.exit(main())
  File "/usr/local/lib/python3.10/dist-packages/click/core.py", line 1130, in __call__
    return self.main(*args, **kwargs)
  File "/usr/local/lib/python3.10/dist-packages/click/core.py", line 1055, in main
    rv = self.invoke(ctx)
  File "/usr/local/lib/python3.10/dist-packages/click/core.py", line 1404, in invoke
    return ctx.invoke(self.callback, **ctx.params)
  File "/usr/local/lib/python3.10/dist-packages/click/core.py", line 760, in invoke
    return __callback(*args, **kwargs)
  File "/usr/local/lib/python3.10/dist-packages/sparseml/yolov8/train.py", line 225, in main
    model.train(**kwargs)
  File "/usr/local/lib/python3.10/dist-packages/sparseml/yolov8/trainers.py", line 796, in train
    self.trainer.train()
  File "/usr/local/lib/python3.10/dist-packages/sparseml/yolov8/trainers.py", line 164, in train
    self._do_train(int(os.getenv("RANK", -1)), world_size)
  File "/usr/local/lib/python3.10/dist-packages/ultralytics/yolo/engine/trainer.py", line 344, in _do_train
    self.metrics, self.fitness = self.validate()
  File "/usr/local/lib/python3.10/dist-packages/ultralytics/yolo/engine/trainer.py", line 439, in validate
    metrics = self.validator(self)
  File "/usr/local/lib/python3.10/dist-packages/sparseml/yolov8/validators.py", line 132, in __call__
    preds = model(batch["img"])
  File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1194, in _call_impl
    return forward_call(*input, **kwargs)
  File "/usr/local/lib/python3.10/dist-packages/ultralytics/nn/tasks.py", line 198, in forward
    return self._forward_once(x, profile, visualize)  # single-scale inference, train
  File "/usr/local/lib/python3.10/dist-packages/ultralytics/nn/tasks.py", line 57, in _forward_once
    x = m(x)  # run
  File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1194, in _call_impl
    return forward_call(*input, **kwargs)
  File "/usr/local/lib/python3.10/dist-packages/ultralytics/nn/modules.py", line 346, in forward
    return torch.cat(x, self.d)
torch.cuda.OutOfMemoryError: CUDA out of memory. Tried to allocate 500.00 MiB (GPU 0; 14.75 GiB total capacity; 13.19 GiB already allocated; 110.81 MiB free; 13.53 GiB reserved in total by PyTorch) If reserved memory is >> allocated memory try setting max_split_size_mb to avoid fragmentation.  See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF

dbogunowicz commented 1 year ago

We have decided with the team, that this is high time we upgraded our ultralytics dependency. The overarching goal is to support more yolo tasks, but as a by-product, we should also solve the incompatibility problems, such as the one discussed in this issue.

So I'd ask you to be patient, we already have the appropriate work planned into our immediate roadmap.

I will not close this issue for now, let's do it once all the aforementioned problems go away.

kopyl commented 1 year ago

@dbogunowicz when are you going to fix it?

dsikka commented 1 year ago

@kopyl We're working on integrating the ultralytics update and its release is pending passing all our internal testing. Thanks

dbogunowicz commented 1 year ago

@kopyl I encourage you to install the current nightly version of the deepsparse. We have updated our support for ultralytics. This should make your error go away!

kopyl commented 1 year ago

@dsikka @dbogunowicz thanks

salwaghanim commented 1 year ago

@dbogunowicz Hello thanks for your hard work. the earlier issue reported was fixed, however another error occured, this error apeared earlier too in other models, and I never managed to solve it. I tried to prune a custom trained model however it failed in the process of quantization (I think) here is the procedure and the results:

Im using google collab for this process.

!pip install sparseml-nightly[dev,torch,torchvision,onnxruntime,deepsparse,transformers,ultralytics]
!pip uninstall tensorflow -y

!pip list | grep sparseml output sparseml-nightly 1.6.0.20230720

%cd /content/drive/MyDrive/projects/yolov8

#https://sparsezoo.neuralmagic.com/models/yolov8-s-coco-pruned50_quantized?hardware=deepsparse-c6i.12xlarge&comparison=yolov8-s-coco-base

!sparseml.ultralytics.train \
  --model "/content/drive/MyDrive/projects/yolov8/runs/detect/cars_alone_Small/weights/best.pt" \
  --recipe zoo:cv/detection/yolov8-s/pytorch/ultralytics/coco/pruned50_quant-none\
  --data "/content/drive/MyDrive/projects/datasets/Aerial_Images.v3i.yolov8/data.yaml" \
  --batch -1 \
  --patience 0 \
  --optimizer AdamW

output

/content/drive/MyDrive/projects/yolov8
Ultralytics YOLOv8.0.124 🚀 Python-3.10.6 torch-2.0.0+cu117 CUDA:0 (Tesla T4, 15102MiB)
yolo/engine/trainer: recipe=zoo:cv/detection/yolov8-s/pytorch/ultralytics/coco/pruned50_quant-none, recipe_args=None, datasets_dir=None, task=detect, mode=train, model=/content/drive/MyDrive/projects/yolov8/runs/detect/cars_alone_Small/weights/best.pt, data=/content/drive/MyDrive/projects/datasets/Aerial_Images.v3i.yolov8/data.yaml, epochs=100, patience=0, batch=-1, imgsz=640, save=True, save_period=-1, cache=False, device=None, workers=8, project=None, name=None, exist_ok=False, pretrained=False, optimizer=AdamW, verbose=False, seed=0, deterministic=True, single_cls=False, image_weights=False, rect=False, cos_lr=False, close_mosaic=10, resume=False, amp=True, fraction=1.0, profile=False, min_memory=False, overlap_mask=True, mask_ratio=4, dropout=0.0, val=True, split=val, save_json=False, save_hybrid=False, conf=None, iou=0.7, max_det=300, half=False, dnn=False, plots=True, source=None, show=False, save_txt=False, save_conf=False, save_crop=False, show_labels=True, show_conf=True, vid_stride=1, line_width=3, visualize=False, augment=False, agnostic_nms=False, classes=None, retina_masks=False, boxes=True, format=torchscript, keras=False, optimize=False, int8=False, dynamic=False, simplify=False, opset=17, workspace=4, nms=False, lr0=0.01, lrf=0.01, momentum=0.937, weight_decay=0.0005, warmup_epochs=3.0, warmup_momentum=0.8, warmup_bias_lr=0.1, box=7.5, cls=0.5, dfl=1.5, fl_gamma=0.0, label_smoothing=0.0, nbs=64, hsv_h=0.015, hsv_s=0.7, hsv_v=0.4, degrees=0.0, translate=0.1, scale=0.5, shear=0.0, perspective=0.0, flipud=0.0, fliplr=0.5, mosaic=1.0, mixup=0.0, copy_paste=0.0, cfg=None, v5loader=False, save_dir=runs/detect/train3
Downloading https://ultralytics.com/assets/Arial.ttf to /root/.config/Ultralytics/Arial.ttf...
100% 755k/755k [00:00<00:00, 135MB/s]

                   from  n    params  module                                       arguments                     
  0                  -1  1       928  ultralytics.nn.modules.conv.Conv             [3, 32, 3, 2]                 
  1                  -1  1     18560  ultralytics.nn.modules.conv.Conv             [32, 64, 3, 2]                
  2                  -1  1     29056  ultralytics.nn.modules.block.C2f             [64, 64, 1, True]             
  3                  -1  1     73984  ultralytics.nn.modules.conv.Conv             [64, 128, 3, 2]               
  4                  -1  2    197632  ultralytics.nn.modules.block.C2f             [128, 128, 2, True]           
  5                  -1  1    295424  ultralytics.nn.modules.conv.Conv             [128, 256, 3, 2]              
  6                  -1  2    788480  ultralytics.nn.modules.block.C2f             [256, 256, 2, True]           
  7                  -1  1   1180672  ultralytics.nn.modules.conv.Conv             [256, 512, 3, 2]              
  8                  -1  1   1838080  ultralytics.nn.modules.block.C2f             [512, 512, 1, True]           
  9                  -1  1    656896  ultralytics.nn.modules.block.SPPF            [512, 512, 5]                 
 10                  -1  1         0  torch.nn.modules.upsampling.Upsample         [None, 2, 'nearest']          
 11             [-1, 6]  1         0  ultralytics.nn.modules.conv.Concat           [1]                           
 12                  -1  1    591360  ultralytics.nn.modules.block.C2f             [768, 256, 1]                 
 13                  -1  1         0  torch.nn.modules.upsampling.Upsample         [None, 2, 'nearest']          
 14             [-1, 4]  1         0  ultralytics.nn.modules.conv.Concat           [1]                           
 15                  -1  1    148224  ultralytics.nn.modules.block.C2f             [384, 128, 1]                 
 16                  -1  1    147712  ultralytics.nn.modules.conv.Conv             [128, 128, 3, 2]              
 17            [-1, 12]  1         0  ultralytics.nn.modules.conv.Concat           [1]                           
 18                  -1  1    493056  ultralytics.nn.modules.block.C2f             [384, 256, 1]                 
 19                  -1  1    590336  ultralytics.nn.modules.conv.Conv             [256, 256, 3, 2]              
 20             [-1, 9]  1         0  ultralytics.nn.modules.conv.Concat           [1]                           
 21                  -1  1   1969152  ultralytics.nn.modules.block.C2f             [768, 512, 1]                 
 22        [15, 18, 21]  1   2116435  ultralytics.nn.modules.head.Detect           [1, [128, 256, 512]]          
Model summary: 225 layers, 11135987 parameters, 11135971 gradients

Transferred 355/355 items from pretrained weights
TensorBoard: Start with 'tensorboard --logdir runs/detect/train3', view at http://localhost:6006/
Received torch.nn.Module, not loading from checkpoint
downloading...: 100% 12.1k/12.1k [00:00<00:00, 4.46MB/s]
AMP: running Automatic Mixed Precision (AMP) checks with YOLOv8n...
Downloading https://github.com/ultralytics/assets/releases/download/v0.0.0/yolov8n.pt to yolov8n.pt...
100% 6.23M/6.23M [00:00<00:00, 91.8MB/s]
AMP: checks passed ✅
AutoBatch: Computing optimal batch size for imgsz=640
AutoBatch: CUDA:0 (Tesla T4) 14.75G total, 0.12G reserved, 0.12G allocated, 14.50G free
      Params      GFLOPs  GPU_mem (GB)  forward (ms) backward (ms)                   input                  output
    11135987           0         0.413           171         278.7        (1, 3, 640, 640)                    list
    11135987           0         0.654         59.27         72.07        (2, 3, 640, 640)                    list
    11135987           0         1.210         63.67         81.21        (4, 3, 640, 640)                    list
    11135987           0         2.271          71.1         96.18        (8, 3, 640, 640)                    list
    11135987           0         3.938         93.25         143.4       (16, 3, 640, 640)                    list
AutoBatch: Using batch-size 40 for CUDA:0 9.92G/14.75G (67%) ✅
train: Scanning /content/drive/MyDrive/projects/datasets/Aerial_Images.v3i.yolov8/train/labels... 4997 images, 3473 backgrounds, 0 corrupt: 100% 4997/4997 [22:44<00:00,  3.66it/s]
train: New cache created: /content/drive/MyDrive/projects/datasets/Aerial_Images.v3i.yolov8/train/labels.cache
albumentations: Blur(p=0.01, blur_limit=(3, 7)), MedianBlur(p=0.01, blur_limit=(3, 7)), ToGray(p=0.01), CLAHE(p=0.01, clip_limit=(1, 4.0), tile_grid_size=(8, 8))
val: Scanning /content/drive/MyDrive/projects/datasets/Aerial_Images.v3i.yolov8/valid/labels... 1441 images, 968 backgrounds, 0 corrupt: 100% 1441/1441 [06:56<00:00,  3.46it/s]
val: New cache created: /content/drive/MyDrive/projects/datasets/Aerial_Images.v3i.yolov8/valid/labels.cache
Plotting labels to runs/detect/train3/labels.jpg... 
optimizer: AdamW(lr=0.01, momentum=0.937) with parameter groups 57 weight(decay=0.0), 64 weight(decay=0.000625), 63 bias(decay=0.0)
val: Scanning /content/drive/MyDrive/projects/datasets/Aerial_Images.v3i.yolov8/valid/labels.cache... 1441 images, 968 backgrounds, 0 corrupt: 100% 1441/1441 [00:00<?, ?it/s]
/usr/local/lib/python3.10/dist-packages/sparseml/yolov8/trainers.py:320: UserWarning: Unable to import wandb for logging
  warnings.warn("Unable to import wandb for logging")
Image sizes 640 train, 640 val
Using 2 dataloader workers
Logging results to runs/detect/train3
Starting training for 56 epochs...

      Epoch    GPU_mem   box_loss   cls_loss   dfl_loss  Instances       Size
       1/56      9.03G      1.198     0.9328      1.405         45        640: 100% 125/125 [02:42<00:00,  1.30s/it]
                 Class     Images  Instances      Box(P          R      mAP50  mAP50-95): 100% 145/145 [00:38<00:00,  3.73it/s]

      Epoch    GPU_mem   box_loss   cls_loss   dfl_loss  Instances       Size
       2/56      9.02G      1.196     0.9327      1.401         36        640: 100% 125/125 [02:52<00:00,  1.38s/it]
                 Class     Images  Instances      Box(P          R      mAP50  mAP50-95): 100% 145/145 [00:40<00:00,  3.55it/s]

      Epoch    GPU_mem   box_loss   cls_loss   dfl_loss  Instances       Size
       3/56      9.02G      1.175     0.8872      1.381         49        640: 100% 125/125 [02:50<00:00,  1.36s/it]
                 Class     Images  Instances      Box(P          R      mAP50  mAP50-95): 100% 145/145 [00:40<00:00,  3.54it/s]

      Epoch    GPU_mem   box_loss   cls_loss   dfl_loss  Instances       Size
       4/56      9.02G      1.158     0.8677      1.369         20        640: 100% 125/125 [02:42<00:00,  1.30s/it]
                 Class     Images  Instances      Box(P          R      mAP50  mAP50-95): 100% 145/145 [00:46<00:00,  3.13it/s]

      Epoch    GPU_mem   box_loss   cls_loss   dfl_loss  Instances       Size
       5/56      9.03G      1.141     0.8527      1.355         32        640: 100% 125/125 [02:43<00:00,  1.31s/it]
                 Class     Images  Instances      Box(P          R      mAP50  mAP50-95): 100% 145/145 [00:38<00:00,  3.79it/s]

      Epoch    GPU_mem   box_loss   cls_loss   dfl_loss  Instances       Size
       6/56      9.07G      1.137     0.8482      1.349         30        640: 100% 125/125 [02:45<00:00,  1.33s/it]
                 Class     Images  Instances      Box(P          R      mAP50  mAP50-95): 100% 145/145 [00:33<00:00,  4.30it/s]

      Epoch    GPU_mem   box_loss   cls_loss   dfl_loss  Instances       Size
       7/56       9.1G      1.117     0.8371      1.351         32        640: 100% 125/125 [02:38<00:00,  1.27s/it]
                 Class     Images  Instances      Box(P          R      mAP50  mAP50-95): 100% 145/145 [00:37<00:00,  3.87it/s]

      Epoch    GPU_mem   box_loss   cls_loss   dfl_loss  Instances       Size
       8/56      9.15G      1.107     0.8109      1.325         40        640: 100% 125/125 [02:40<00:00,  1.28s/it]
                 Class     Images  Instances      Box(P          R      mAP50  mAP50-95): 100% 145/145 [00:36<00:00,  4.00it/s]

      Epoch    GPU_mem   box_loss   cls_loss   dfl_loss  Instances       Size
       9/56      9.17G      1.128     0.8405      1.356         45        640: 100% 125/125 [02:40<00:00,  1.29s/it]
                 Class     Images  Instances      Box(P          R      mAP50  mAP50-95): 100% 145/145 [00:42<00:00,  3.42it/s]

      Epoch    GPU_mem   box_loss   cls_loss   dfl_loss  Instances       Size
      10/56      9.07G      1.128     0.8338       1.34         38        640: 100% 125/125 [02:33<00:00,  1.23s/it]
                 Class     Images  Instances      Box(P          R      mAP50  mAP50-95): 100% 145/145 [00:38<00:00,  3.76it/s]

      Epoch    GPU_mem   box_loss   cls_loss   dfl_loss  Instances       Size
      11/56      9.19G      1.115     0.8095       1.34         54        640: 100% 125/125 [02:40<00:00,  1.29s/it]
                 Class     Images  Instances      Box(P          R      mAP50  mAP50-95): 100% 145/145 [00:35<00:00,  4.09it/s]

      Epoch    GPU_mem   box_loss   cls_loss   dfl_loss  Instances       Size
      12/56      9.14G      1.101      0.809      1.329         41        640: 100% 125/125 [02:41<00:00,  1.29s/it]
                 Class     Images  Instances      Box(P          R      mAP50  mAP50-95): 100% 145/145 [00:33<00:00,  4.32it/s]

      Epoch    GPU_mem   box_loss   cls_loss   dfl_loss  Instances       Size
      13/56      9.18G      1.104      0.804      1.323         43        640: 100% 125/125 [02:39<00:00,  1.28s/it]
                 Class     Images  Instances      Box(P          R      mAP50  mAP50-95): 100% 145/145 [00:43<00:00,  3.32it/s]

      Epoch    GPU_mem   box_loss   cls_loss   dfl_loss  Instances       Size
      14/56       9.1G      1.097     0.7908      1.333         53        640: 100% 125/125 [02:42<00:00,  1.30s/it]
                 Class     Images  Instances      Box(P          R      mAP50  mAP50-95): 100% 145/145 [00:37<00:00,  3.86it/s]

      Epoch    GPU_mem   box_loss   cls_loss   dfl_loss  Instances       Size
      15/56      9.06G      1.098     0.7934      1.328         38        640: 100% 125/125 [02:41<00:00,  1.29s/it]
                 Class     Images  Instances      Box(P          R      mAP50  mAP50-95): 100% 145/145 [00:35<00:00,  4.10it/s]

      Epoch    GPU_mem   box_loss   cls_loss   dfl_loss  Instances       Size
      16/56       9.1G      1.067     0.7984      1.311         49        640: 100% 125/125 [02:38<00:00,  1.27s/it]
                 Class     Images  Instances      Box(P          R      mAP50  mAP50-95): 100% 145/145 [00:32<00:00,  4.43it/s]

      Epoch    GPU_mem   box_loss   cls_loss   dfl_loss  Instances       Size
      17/56      9.14G      1.095     0.7907       1.33         40        640: 100% 125/125 [02:39<00:00,  1.28s/it]
                 Class     Images  Instances      Box(P          R      mAP50  mAP50-95): 100% 145/145 [00:35<00:00,  4.13it/s]

      Epoch    GPU_mem   box_loss   cls_loss   dfl_loss  Instances       Size
      18/56      9.15G      1.099     0.7821      1.326         33        640: 100% 125/125 [02:34<00:00,  1.24s/it]
                 Class     Images  Instances      Box(P          R      mAP50  mAP50-95): 100% 145/145 [00:38<00:00,  3.80it/s]

      Epoch    GPU_mem   box_loss   cls_loss   dfl_loss  Instances       Size
      19/56      9.16G      1.058     0.7613      1.303         51        640: 100% 125/125 [02:35<00:00,  1.24s/it]
                 Class     Images  Instances      Box(P          R      mAP50  mAP50-95): 100% 145/145 [00:32<00:00,  4.53it/s]

      Epoch    GPU_mem   box_loss   cls_loss   dfl_loss  Instances       Size
      20/56       9.1G      1.058     0.7681      1.303         35        640: 100% 125/125 [02:40<00:00,  1.28s/it]
                 Class     Images  Instances      Box(P          R      mAP50  mAP50-95): 100% 145/145 [00:34<00:00,  4.15it/s]

      Epoch    GPU_mem   box_loss   cls_loss   dfl_loss  Instances       Size
      21/56      9.11G      1.042     0.7607      1.301         41        640: 100% 125/125 [02:36<00:00,  1.25s/it]
                 Class     Images  Instances      Box(P          R      mAP50  mAP50-95): 100% 145/145 [00:37<00:00,  3.83it/s]

      Epoch    GPU_mem   box_loss   cls_loss   dfl_loss  Instances       Size
      22/56      9.11G       1.05     0.7497        1.3         38        640: 100% 125/125 [02:31<00:00,  1.21s/it]
                 Class     Images  Instances      Box(P          R      mAP50  mAP50-95): 100% 145/145 [00:39<00:00,  3.69it/s]

      Epoch    GPU_mem   box_loss   cls_loss   dfl_loss  Instances       Size
      23/56      9.23G      1.049     0.7457      1.303         42        640: 100% 125/125 [02:32<00:00,  1.22s/it]
                 Class     Images  Instances      Box(P          R      mAP50  mAP50-95): 100% 145/145 [00:35<00:00,  4.10it/s]

      Epoch    GPU_mem   box_loss   cls_loss   dfl_loss  Instances       Size
      24/56      9.09G       1.04     0.7351      1.286         29        640: 100% 125/125 [02:41<00:00,  1.29s/it]
                 Class     Images  Instances      Box(P          R      mAP50  mAP50-95): 100% 145/145 [00:34<00:00,  4.24it/s]

      Epoch    GPU_mem   box_loss   cls_loss   dfl_loss  Instances       Size
      25/56      9.15G      1.031     0.7324      1.291         38        640: 100% 125/125 [02:37<00:00,  1.26s/it]
                 Class     Images  Instances      Box(P          R      mAP50  mAP50-95): 100% 145/145 [00:36<00:00,  4.02it/s]

      Epoch    GPU_mem   box_loss   cls_loss   dfl_loss  Instances       Size
      26/56      9.15G      1.035     0.7314      1.279         42        640: 100% 125/125 [02:34<00:00,  1.24s/it]
                 Class     Images  Instances      Box(P          R      mAP50  mAP50-95): 100% 145/145 [00:36<00:00,  3.92it/s]

      Epoch    GPU_mem   box_loss   cls_loss   dfl_loss  Instances       Size
      27/56       9.2G       1.01     0.7113      1.274         38        640: 100% 125/125 [02:29<00:00,  1.20s/it]
                 Class     Images  Instances      Box(P          R      mAP50  mAP50-95): 100% 145/145 [00:38<00:00,  3.81it/s]

      Epoch    GPU_mem   box_loss   cls_loss   dfl_loss  Instances       Size
      28/56      9.23G       1.03     0.7264      1.288         46        640: 100% 125/125 [02:31<00:00,  1.21s/it]
                 Class     Images  Instances      Box(P          R      mAP50  mAP50-95): 100% 145/145 [00:34<00:00,  4.19it/s]

      Epoch    GPU_mem   box_loss   cls_loss   dfl_loss  Instances       Size
      29/56      9.11G      1.001     0.7005      1.268         30        640: 100% 125/125 [02:34<00:00,  1.24s/it]
                 Class     Images  Instances      Box(P          R      mAP50  mAP50-95): 100% 145/145 [00:34<00:00,  4.24it/s]

      Epoch    GPU_mem   box_loss   cls_loss   dfl_loss  Instances       Size
      30/56      9.14G      1.015     0.7184      1.285         26        640: 100% 125/125 [02:38<00:00,  1.27s/it]
                 Class     Images  Instances      Box(P          R      mAP50  mAP50-95): 100% 145/145 [00:31<00:00,  4.54it/s]

      Epoch    GPU_mem   box_loss   cls_loss   dfl_loss  Instances       Size
      31/56      9.18G     0.9891     0.6826      1.263         28        640: 100% 125/125 [02:31<00:00,  1.21s/it]
                 Class     Images  Instances      Box(P          R      mAP50  mAP50-95): 100% 145/145 [00:35<00:00,  4.09it/s]

      Epoch    GPU_mem   box_loss   cls_loss   dfl_loss  Instances       Size
      32/56      9.22G      1.003     0.7023      1.269         72        640: 100% 125/125 [02:30<00:00,  1.20s/it]
                 Class     Images  Instances      Box(P          R      mAP50  mAP50-95): 100% 145/145 [00:31<00:00,  4.62it/s]

      Epoch    GPU_mem   box_loss   cls_loss   dfl_loss  Instances       Size
      33/56      9.12G     0.9851     0.6886      1.265         35        640: 100% 125/125 [02:35<00:00,  1.25s/it]
                 Class     Images  Instances      Box(P          R      mAP50  mAP50-95): 100% 145/145 [00:31<00:00,  4.57it/s]

      Epoch    GPU_mem   box_loss   cls_loss   dfl_loss  Instances       Size
      34/56      9.12G     0.9909     0.6815      1.265         28        640: 100% 125/125 [02:35<00:00,  1.24s/it]
                 Class     Images  Instances      Box(P          R      mAP50  mAP50-95): 100% 145/145 [00:31<00:00,  4.54it/s]

      Epoch    GPU_mem   box_loss   cls_loss   dfl_loss  Instances       Size
      35/56      9.13G     0.9874     0.6953      1.264         42        640: 100% 125/125 [02:35<00:00,  1.25s/it]
                 Class     Images  Instances      Box(P          R      mAP50  mAP50-95): 100% 145/145 [00:32<00:00,  4.43it/s]

      Epoch    GPU_mem   box_loss   cls_loss   dfl_loss  Instances       Size
      36/56      9.15G     0.9732     0.6794      1.254         31        640: 100% 125/125 [02:31<00:00,  1.21s/it]
                 Class     Images  Instances      Box(P          R      mAP50  mAP50-95): 100% 145/145 [00:37<00:00,  3.91it/s]

      Epoch    GPU_mem   box_loss   cls_loss   dfl_loss  Instances       Size
      37/56      9.19G     0.9754     0.6826      1.254         34        640: 100% 125/125 [02:31<00:00,  1.21s/it]
                 Class     Images  Instances      Box(P          R      mAP50  mAP50-95): 100% 145/145 [00:34<00:00,  4.15it/s]

      Epoch    GPU_mem   box_loss   cls_loss   dfl_loss  Instances       Size
      38/56      9.09G     0.9524     0.6706      1.246         33        640: 100% 125/125 [02:34<00:00,  1.24s/it]
                 Class     Images  Instances      Box(P          R      mAP50  mAP50-95): 100% 145/145 [00:34<00:00,  4.22it/s]

      Epoch    GPU_mem   box_loss   cls_loss   dfl_loss  Instances       Size
      39/56      9.12G     0.9432      0.669      1.232         49        640: 100% 125/125 [02:28<00:00,  1.19s/it]
                 Class     Images  Instances      Box(P          R      mAP50  mAP50-95): 100% 145/145 [00:35<00:00,  4.09it/s]

      Epoch    GPU_mem   box_loss   cls_loss   dfl_loss  Instances       Size
      40/56      9.14G     0.9538      0.656      1.241         43        640: 100% 125/125 [02:32<00:00,  1.22s/it]
                 Class     Images  Instances      Box(P          R      mAP50  mAP50-95): 100% 145/145 [00:38<00:00,  3.77it/s]

      Epoch    GPU_mem   box_loss   cls_loss   dfl_loss  Instances       Size
      41/56      9.13G     0.9356     0.6582      1.233         37        640: 100% 125/125 [02:33<00:00,  1.23s/it]
                 Class     Images  Instances      Box(P          R      mAP50  mAP50-95): 100% 145/145 [00:36<00:00,  3.96it/s]

      Epoch    GPU_mem   box_loss   cls_loss   dfl_loss  Instances       Size
      42/56      9.19G     0.9409     0.6562      1.237         48        640: 100% 125/125 [02:25<00:00,  1.16s/it]
                 Class     Images  Instances      Box(P          R      mAP50  mAP50-95): 100% 145/145 [00:40<00:00,  3.54it/s]

      Epoch    GPU_mem   box_loss   cls_loss   dfl_loss  Instances       Size
      43/56      9.18G     0.9437      0.655      1.233         44        640: 100% 125/125 [02:27<00:00,  1.18s/it]
                 Class     Images  Instances      Box(P          R      mAP50  mAP50-95): 100% 145/145 [00:34<00:00,  4.25it/s]

      Epoch    GPU_mem   box_loss   cls_loss   dfl_loss  Instances       Size
      44/56      9.17G     0.9439      0.657      1.238         35        640: 100% 125/125 [02:28<00:00,  1.18s/it]
                 Class     Images  Instances      Box(P          R      mAP50  mAP50-95): 100% 145/145 [00:35<00:00,  4.08it/s]

      Epoch    GPU_mem   box_loss   cls_loss   dfl_loss  Instances       Size
      45/56      9.19G     0.9251     0.6401      1.228         62        640: 100% 125/125 [02:30<00:00,  1.20s/it]
                 Class     Images  Instances      Box(P          R      mAP50  mAP50-95): 100% 145/145 [00:34<00:00,  4.17it/s]

      Epoch    GPU_mem   box_loss   cls_loss   dfl_loss  Instances       Size
      46/56      9.18G     0.9268     0.6301      1.228         55        640: 100% 125/125 [02:31<00:00,  1.21s/it]
                 Class     Images  Instances      Box(P          R      mAP50  mAP50-95): 100% 145/145 [00:34<00:00,  4.19it/s]
Closing dataloader mosaic
albumentations: Blur(p=0.01, blur_limit=(3, 7)), MedianBlur(p=0.01, blur_limit=(3, 7)), ToGray(p=0.01), CLAHE(p=0.01, clip_limit=(1, 4.0), tile_grid_size=(8, 8))

      Epoch    GPU_mem   box_loss   cls_loss   dfl_loss  Instances       Size
      47/56      9.22G     0.8546     0.5353      1.236         12        640: 100% 125/125 [02:28<00:00,  1.18s/it]
                 Class     Images  Instances      Box(P          R      mAP50  mAP50-95): 100% 145/145 [00:33<00:00,  4.28it/s]

      Epoch    GPU_mem   box_loss   cls_loss   dfl_loss  Instances       Size
      48/56      9.15G     0.8286     0.4879       1.21         16        640: 100% 125/125 [02:18<00:00,  1.11s/it]
                 Class     Images  Instances      Box(P          R      mAP50  mAP50-95): 100% 145/145 [00:36<00:00,  3.95it/s]

      Epoch    GPU_mem   box_loss   cls_loss   dfl_loss  Instances       Size
      49/56      9.18G     0.8166     0.4758        1.2         36        640: 100% 125/125 [02:19<00:00,  1.11s/it]
                 Class     Images  Instances      Box(P          R      mAP50  mAP50-95): 100% 145/145 [00:35<00:00,  4.13it/s]

      Epoch    GPU_mem   box_loss   cls_loss   dfl_loss  Instances       Size
      50/56      9.18G     0.8099     0.4689        1.2         22        640: 100% 125/125 [02:25<00:00,  1.16s/it]
                 Class     Images  Instances      Box(P          R      mAP50  mAP50-95): 100% 145/145 [00:31<00:00,  4.59it/s]

      Epoch    GPU_mem   box_loss   cls_loss   dfl_loss  Instances       Size
  0% 0/125 [00:00<?, ?it/s]
Traceback (most recent call last):
  File "/usr/local/bin/sparseml.ultralytics.train", line 8, in <module>
    sys.exit(main())
  File "/usr/local/lib/python3.10/dist-packages/click/core.py", line 1157, in __call__
    return self.main(*args, **kwargs)
  File "/usr/local/lib/python3.10/dist-packages/click/core.py", line 1078, in main
    rv = self.invoke(ctx)
  File "/usr/local/lib/python3.10/dist-packages/click/core.py", line 1434, in invoke
    return ctx.invoke(self.callback, **ctx.params)
  File "/usr/local/lib/python3.10/dist-packages/click/core.py", line 783, in invoke
    return __callback(*args, **kwargs)
  File "/usr/local/lib/python3.10/dist-packages/sparseml/yolov8/train.py", line 227, in main
    model.train(**kwargs)
  File "/usr/local/lib/python3.10/dist-packages/sparseml/yolov8/trainers.py", line 858, in train
    self.trainer.train()
  File "/usr/local/lib/python3.10/dist-packages/sparseml/yolov8/trainers.py", line 179, in train
    self._do_train(world_size)
  File "/usr/local/lib/python3.10/dist-packages/ultralytics/yolo/engine/trainer.py", line 343, in _do_train
    self.optimizer_step()
  File "/usr/local/lib/python3.10/dist-packages/sparseml/yolov8/trainers.py", line 434, in optimizer_step
    super().optimizer_step()
  File "/usr/local/lib/python3.10/dist-packages/ultralytics/yolo/engine/trainer.py", line 462, in optimizer_step
    self.scaler.step(self.optimizer)
  File "/usr/local/lib/python3.10/dist-packages/sparseml/pytorch/optim/manager.py", line 173, in step
    return self._perform_wrapped_step(*args, **kwargs)
  File "/usr/local/lib/python3.10/dist-packages/sparseml/pytorch/optim/manager.py", line 210, in _perform_wrapped_step
    self._wrapped_manager.update(
  File "/usr/local/lib/python3.10/dist-packages/sparseml/pytorch/optim/manager.py", line 591, in update
    mod.scheduled_update(module, optimizer, epoch, steps_per_epoch)
  File "/usr/local/lib/python3.10/dist-packages/sparseml/pytorch/sparsification/modifier.py", line 620, in scheduled_update
    self.update(module, optimizer, epoch=epoch, steps_per_epoch=steps_per_epoch)
  File "/usr/local/lib/python3.10/dist-packages/sparseml/pytorch/sparsification/quantization/modifier_quantization.py", line 419, in update
    self._check_quantization_update(module, epoch, steps_per_epoch)
  File "/usr/local/lib/python3.10/dist-packages/sparseml/pytorch/sparsification/quantization/modifier_quantization.py", line 467, in _check_quantization_update
    self._enable_module_qat(module)
  File "/usr/local/lib/python3.10/dist-packages/sparseml/pytorch/sparsification/quantization/modifier_quantization.py", line 498, in _enable_module_qat
    set_quantization_schemes(
  File "/usr/local/lib/python3.10/dist-packages/sparseml/pytorch/sparsification/quantization/quantize.py", line 142, in set_quantization_schemes
    _validate_set_module_schemes(model, scheme_overrides, ignore)
  File "/usr/local/lib/python3.10/dist-packages/sparseml/pytorch/sparsification/quantization/quantize.py", line 447, in _validate_set_module_schemes
    raise ValueError(
ValueError: scheme_overrides contains submodule names or module types that do not match to any submodules in the model. unmatched values: ['model.12.cv1.act', 'model.12.cv2.act', 'model.12.m.0.cv1.conv', 'model.12.m.0.cv2.act', 'model.15.cv1.act', 'model.15.cv2.act', 'model.15.m.0.cv1.conv', 'model.15.m.0.cv2.act', 'model.16.act', 'model.16.conv', 'model.18.cv1.act', 'model.18.m.0.cv1.conv', 'model.18.m.0.cv2.act', 'model.19.act', 'model.2.cv1.act', 'model.2.m.0.add_input_0', 'model.2.m.0.cv1.conv', 'model.21.cv1.act', 'model.21.m.0.cv1.conv', 'model.21.m.0.cv2.act', 'model.22.cv2.0.0.conv', 'model.22.cv3.0.0.conv', 'model.4.cv1.act', 'model.4.cv2.act', 'model.4.m.0.add_input_0', 'model.4.m.0.cv1.conv', 'model.5.conv', 'model.6.cv1.act', 'model.6.cv2.act', 'model.6.m.0.add_input_0', 'model.6.m.0.cv1.conv', 'model.7.conv', 'model.8.cv1.act', 'model.8.cv2.act', 'model.8.m.0.add_input_0', 'model.8.m.0.cv1.conv', 'model.9.cv1.act', 'model.9.cv2.act']

salwaghanim commented 1 year ago

Any New Update? @dsikka @dbogunowicz

dsikka commented 1 year ago

@salwaghanim sorry for the delay - will take a look at this error this week

salwaghanim commented 1 year ago

@dsikka Thanks! I think that the problem is in the recipes. I think that some recipes rely on an older pytorch version or an older ultralytics version. Iam testing some recipe and I will open a new issue soon here is an example: recipe: zoo:cv/detection/yolov5-m/pytorch/ultralytics/coco/pruned55_quant-none-vnni error: - !pytorch.NotCurrentlySupported other error in another recipenot sure not sure which recipe this error occurred: AssertionError: min nan should be less than max nan

dsikka commented 1 year ago

Hi @salwaghanim - do you mind sharing a print out of the model layers of the model you're applying the recipe to? /content/drive/MyDrive/projects/yolov8/runs/detect/cars_alone_Small/weights/best.pt

The recipe you listed seems to work with the yolov8s.pt model from our supported ultralytics version so just want to confirm the model structure being used

Thanks!

salwaghanim commented 1 year ago

Hey @dsikka sorry for the late reply, yes the recipe is working well with the original yolov8s however it's not working with a model I finetuned starting from yolov8s for a specific dataset. using the official coco weights is not good enough, even if I want to use a class available in coco lets say cars, the weights of yolo will not be just for cars, by further training the model on one or two classes we can push the precision even higher that reached in the weights you published, I was successful in applying those steps to yolov5 for a while then some updates pushed messed up the training procedure I was using, anyhow the model was successful I pushed the precision from 61% to over 90%. the theory behind this is very simple, when you fine tune the weights of original model for a small number of classes, the weights of desired classes will be very high even if we apply aggressive sparsifcation the model will stile detect the appropriate class. their is a problem with the new ultralytics version, their are issues too in some recipes using an old version of PyTorch... for now I will provide any information to help in fixing sparsifcation process for yolov8 for specific recipes later on I will point for any additional issues I find.

dsikka commented 1 year ago

Hi @salwaghanim

Do you mind providing a print out of the model layers of the model that you fine-tuned and then tried applying the recipe on? This will help provide further insight into why the recipe is causing issues. Thanks!

jeanniefinks commented 1 year ago

(friendly ping to @salwaghanim -thanks!) Jeannie / Neural Magic

jeanniefinks commented 1 year ago

Hi @salwaghanim Some time has gone by and we have not heard back. I am going to go ahead and close out this issue but if you're able to continue the conversation, we'd be happy to do so. Thank you! Jeannie / Neural Magic

salwaghanim commented 12 months ago

Hello Dear @jeanniefinks I hope that you are in good health. Sorry for taking so long to reply I was in maternity leave, then my Sara got really sick due to birth complication. I just come back to work and I've been testing some recopies for yolov8 transfer learning sparsifiaction. I have trained 4 models nano,small,medium,and large Model and tried every available recipe for the models I mentioned and only one worked. Here is a detailed notes for the process.

working recipies for yolov8:
for large model: 
recipie: "zoo:yolov8-l-coco-pruned80"
url" https://sparsezoo.neuralmagic.com/models/yolov8-l-coco-pruned80?hardware=deepsparse-c6i.12xlarge&comparison=yolov8-l-coco-base&tab=1

example command:
sparseml.ultralytics.train \
  --model "/media/experement/Experiment/yolov8_models_cars/runs/detect/Arial_car_L2/weights/best.pt" \
  --recipe "zoo:yolov8-l-coco-pruned80" \
  --data "/home/experement/Desktop/projects/detection/datasets/Arial_car.v1i.yolov8/data.yaml" \
  --batch 4 \
  --optimizer AdamW

for yolo meduim:
None

for yolo small:
None

for yolo nano
None


Errors:
recipies that dont work
recipe: zoo:yolov8-l-coco-pruned85_quantized
url: https://sparsezoo.neuralmagic.com/models/yolov8-l-coco-pruned85_quantized?hardware=deepsparse-c6i.12xlarge&comparison=yolov8-l-coco-base

errors:

Allocating more memory than avaialable.

recipe: zoo:yolov8-s-coco-pruned70_quantized , zoo:yolov8-s-coco-pruned70 , zoo:yolov8-s-coco-pruned65_quantized , zoo:yolov8-s-coco-pruned50
url: https://sparsezoo.neuralmagic.com/models/yolov8-s-coco-pruned70_quantized?hardware=deepsparse-c6i.12xlarge&comparison=yolov8-s-coco-base&tab=1 , https://sparsezoo.neuralmagic.com/models/yolov8-s-coco-pruned70?comparison=yolov8-s-coco-base&tab=1, https://sparsezoo.neuralmagic.com/models/yolov8-s-coco-pruned65_quantized?comparison=yolov8-s-coco-base&tab=0,https://sparsezoo.neuralmagic.com/models/yolov8-s-coco-pruned50?comparison=yolov8-s-coco-base&tab=0

error: ModuleNotFoundError: No module named 'ultralytics.nn.modules.conv'; 'ultralytics.nn.modules' is not a package

in every yolov8 nano model same error

aslo in every yolov8 meduim same error

My Environment OS:Ubuntu 22.04.3 LTS Cuda 11.8 pytorch 2.0

I installed the sparseml using: pip install sparseml[ultralytics] I also tried to install all its dependencies using the command: pip install sparseml[dev,torch,torchvision,deepsparse,onnxruntime,transformers,ultralytics] the procedure I used is to train a custom yolov8 models using Ultralitics then I installed the Sparsml And applied sparse recipes on my best.pt weights files. as mentioned Earlier in the context of this thread Iam using this approach to use the smallest model possible while maintaining above 80 Precision

neuralmagic / sparseml

`ModuleNotFoundError: No module named 'ultralytics.nn.modules.conv'; 'ultralytics.nn.modules' is not a package` #1621