neuralmagic / sparseml

Libraries for applying sparsification recipes to neural networks with a few lines of code, enabling faster and smaller models
Apache License 2.0
2.05k stars 145 forks source link

yolov5 sparsifying model error after update sparseml-nightly 1.5.0.20230521 #1603

Closed salwaghanim closed 1 year ago

salwaghanim commented 1 year ago

Describe the bug Model is not training due to a pytorch problem Expected behavior model should be trained normally. Environment I have tested the following on two seperate enviroments both had the same issue. ENV one google collab ENV two

  1. OS [e.g. Ubuntu 18.04]: Ubuntu 22.04
  2. Python version [e.g. 3.7]: 3.8.8
  3. SparseML version or commit hash [e.g. 0.1.0, f7245c8]: sparseml-nightly 1.5.0.20230521
  4. ML framework version(s) [e.g. torch 1.7.1]: 1.12.0+cu102
  5. Other Python package versions [e.g. SparseZoo, DeepSparse, numpy, ONNX]:
  6. Other relevant environment information [e.g. hardware, CUDA version]:

To Reproduce

!pip uninstall tensorflow -y 
!pip install sparseml-nightly[dev,torchvision,deepsparse,onnxruntime,transformers,yolov5]
!sparseml.yolov5.train --cfg "/content/drive/MyDrive/projects/yolov5/models/yolov5m.yaml" --weights "/content/drive/MyDrive/projects/yolov5/runs/train/cars_Meduim9/weights/best.pt" --recipe "zoo:cv/detection/yolov5-m6/pytorch/ultralytics/coco/pruned60_quant-none-vnni" --data "/content/drive/MyDrive/projects/datasets/Car_Models.v4i.yolov5pytorch/data.yaml" --patience 0

Errors train: weights=/content/drive/MyDrive/projects/yolov5/runs/train/cars_Meduim9/weights/best.pt, cfg=/content/drive/MyDrive/projects/yolov5/models/yolov5m.yaml, teacher_weights=, data=/content/drive/MyDrive/projects/datasets/Car_Models.v4i.yolov5pytorch/data.yaml, data_path=, hyp=../usr/local/lib/python3.10/dist-packages/yolov5/data/hyps/hyp.scratch-low.yaml, epochs=300, batch_size=16, gradient_accum_steps=-1, imgsz=640, rect=False, resume=False, nosave=False, noval=False, noautoanchor=False, noplots=False, evolve=None, bucket=, cache=None, image_weights=False, device=, multi_scale=False, single_cls=False, optimizer=SGD, sync_bn=False, workers=8, project=/content/yolov5_runs/train, log_dir=None, name=exp, exist_ok=False, quad=False, cos_lr=False, label_smoothing=0.0, patience=0, freeze=[0], save_period=-1, seed=0, local_rank=-1, recipe=zoo:cv/detection/yolov5-m6/pytorch/ultralytics/coco/pruned60_quant-none-vnni, recipe_args=None, entity=None, upload_dataset=False, bbox_interval=-1, artifact_alias=latest github: skipping check (not a git repository), for updates see https://github.com/ultralytics/yolov5 YOLOv5 🚀 2023-6-2 Python-3.10.11 torch-1.12.0+cu102 CUDA:0 (Tesla T4, 15102MiB)

hyperparameters: lr0=0.01, lrf=0.01, momentum=0.937, weight_decay=0.0005, warmup_epochs=3.0, warmup_momentum=0.8, warmup_bias_lr=0.1, box=0.05, cls=0.5, cls_pw=1.0, obj=1.0, obj_pw=1.0, iou_t=0.2, anchor_t=4.0, fl_gamma=0.0, hsv_h=0.015, hsv_s=0.7, hsv_v=0.4, degrees=0.0, translate=0.1, scale=0.5, shear=0.0, perspective=0.0, flipud=0.0, fliplr=0.5, mosaic=1.0, mixup=0.0, copy_paste=0.0 ClearML: run 'pip install clearml' to automatically track, visualize and remotely train YOLOv5 🚀 in ClearML Comet: run 'pip install comet_ml' to automatically track and visualize YOLOv5 🚀 runs in Comet TensorBoard: Start with 'tensorboard --logdir /content/yolov5_runs/train', view at http://localhost:6006/ Overriding model.yaml nc=80 with nc=10

             from  n    params  module                                  arguments                     

0 -1 1 5280 yolov5.models.common.Conv [3, 48, 6, 2, 2]
1 -1 1 41664 yolov5.models.common.Conv [48, 96, 3, 2]
2 -1 2 65280 yolov5.models.common.C3 [96, 96, 2]
3 -1 1 166272 yolov5.models.common.Conv [96, 192, 3, 2]
4 -1 4 444672 yolov5.models.common.C3 [192, 192, 4]
5 -1 1 664320 yolov5.models.common.Conv [192, 384, 3, 2]
6 -1 6 2512896 yolov5.models.common.C3 [384, 384, 6]
7 -1 1 2655744 yolov5.models.common.Conv [384, 768, 3, 2]
8 -1 2 4134912 yolov5.models.common.C3 [768, 768, 2]
9 -1 1 1476864 yolov5.models.common.SPPF [768, 768, 5]
10 -1 1 295680 yolov5.models.common.Conv [768, 384, 1, 1]
11 -1 1 0 torch.nn.modules.upsampling.Upsample [None, 2, 'nearest']
12 [-1, 6] 1 0 yolov5.models.common.Concat [1]
13 -1 2 1182720 yolov5.models.common.C3 [768, 384, 2, False]
14 -1 1 74112 yolov5.models.common.Conv [384, 192, 1, 1]
15 -1 1 0 torch.nn.modules.upsampling.Upsample [None, 2, 'nearest']
16 [-1, 4] 1 0 yolov5.models.common.Concat [1]
17 -1 2 296448 yolov5.models.common.C3 [384, 192, 2, False]
18 -1 1 332160 yolov5.models.common.Conv [192, 192, 3, 2]
19 [-1, 14] 1 0 yolov5.models.common.Concat [1]
20 -1 2 1035264 yolov5.models.common.C3 [384, 384, 2, False]
21 -1 1 1327872 yolov5.models.common.Conv [384, 384, 3, 2]
22 [-1, 10] 1 0 yolov5.models.common.Concat [1]
23 -1 2 4134912 yolov5.models.common.C3 [768, 768, 2, False]
24 [17, 20, 23] 1 60615 yolov5.models.yolo.Detect [10, [[10, 13, 16, 30, 33, 23], [30, 61, 62, 45, 59, 119], [116, 90, 156, 198, 373, 326]], [192, 384, 768]] YOLOv5m summary: 291 layers, 20907687 parameters, 20907687 gradients, 48.3 GFLOPs

Traceback (most recent call last): File "/usr/local/bin/sparseml.yolov5.train", line 8, in sys.exit(train()) File "/usr/local/lib/python3.10/dist-packages/torch/distributed/elastic/multiprocessing/errors/init.py", line 345, in wrapper return f(*args, kwargs) File "/usr/local/lib/python3.10/dist-packages/sparseml/yolov5/scripts.py", line 41, in train train_run(vars(opt)) File "/usr/local/lib/python3.10/dist-packages/yolov5/train.py", line 732, in run main(opt) File "/usr/local/lib/python3.10/dist-packages/yolov5/train.py", line 632, in main train(opt.hyp, opt, device, callbacks) File "/usr/local/lib/python3.10/dist-packages/yolov5/train.py", line 130, in train sparsification_manager = maybe_create_sparsification_manager(model, ckpt=ckpt, train_recipe=opt.recipe, recipe_args=opt.recipe_args, device=device, resumed=opt.resume) File "/usr/local/lib/python3.10/dist-packages/yolov5/utils/neuralmagic/sparsification_manager.py", line 589, in maybe_create_sparsification_manager sparsification_manager = SparsificationManager( File "/usr/local/lib/python3.10/dist-packages/yolov5/utils/neuralmagic/sparsification_manager.py", line 79, in init ScheduledModifierManager.from_yaml( File "/usr/local/lib/python3.10/dist-packages/sparseml/pytorch/optim/manager.py", line 285, in from_yaml modifiers = Modifier.load_list(yaml_str) File "/usr/local/lib/python3.10/dist-packages/sparseml/pytorch/sparsification/modifier.py", line 101, in load_list return Modifier.load_framework_list(yaml_str, PYTORCH_FRAMEWORK) File "/usr/local/lib/python3.10/dist-packages/sparseml/optim/modifier.py", line 318, in load_framework_list container = yaml.safe_load(yaml_str) File "/usr/local/lib/python3.10/dist-packages/yaml/init.py", line 125, in safe_load return load(stream, SafeLoader) File "/usr/local/lib/python3.10/dist-packages/yaml/init.py", line 81, in load return loader.get_single_data() File "/usr/local/lib/python3.10/dist-packages/yaml/constructor.py", line 51, in get_single_data return self.construct_document(node) File "/usr/local/lib/python3.10/dist-packages/yaml/constructor.py", line 60, in construct_document for dummy in generator: File "/usr/local/lib/python3.10/dist-packages/yaml/constructor.py", line 408, in construct_yaml_seq data.extend(self.construct_sequence(node)) File "/usr/local/lib/python3.10/dist-packages/yaml/constructor.py", line 129, in construct_sequence return [self.construct_object(child, deep=deep) File "/usr/local/lib/python3.10/dist-packages/yaml/constructor.py", line 129, in return [self.construct_object(child, deep=deep) File "/usr/local/lib/python3.10/dist-packages/yaml/constructor.py", line 100, in construct_object data = constructor(self, node) File "/usr/local/lib/python3.10/dist-packages/yaml/constructor.py", line 427, in construct_undefined raise ConstructorError(None, None, yaml.constructor.ConstructorError: could not determine a constructor for the tag '!pytorch.NotCurrentlySupported' in "", line 2, column 3:

Additional context I have also tested creating a virtual env from issue number 1587 The result was the same (same error)

salwaghanim commented 1 year ago

btw by any chance if you can provide a working env requirements for yolov8 detection this will be better than answering on this issue. do you have an estimate time for beta release for ARM devices deepsparse engine?

salwaghanim commented 1 year ago

I have downgraded to sparseml 1.4.4 still got the same issue. I also downgraded to the last python yaml still got the same issue.

!pip install sparseml==1.4.4
!pip install sparseml[dev,torchvision,deepsparse,onnxruntime,transformers,yolov5,torch]
!pip install PyYAML==5.4.1

bellow is code snipits from python yaml constructor I selected the full track of the error

   def get_single_data(self):
        # Ensure that the stream contains a single document and construct it.
        node = self.get_single_node()
        if node is not None:
            return self.construct_document(node) ##here line 51
        return None

    def construct_document(self, node):
        data = self.construct_object(node)
        while self.state_generators:
            state_generators = self.state_generators
            self.state_generators = []
            for generator in state_generators:
                for dummy in generator: ##here line 60
                    pass
        self.constructed_objects = {}
        self.recursive_objects = {}
        self.deep_construct = False
        return data

    def construct_object(self, node, deep=False):
        if node in self.constructed_objects:
            return self.constructed_objects[node]
        if deep:
            old_deep = self.deep_construct
            self.deep_construct = True
        if node in self.recursive_objects:
            raise ConstructorError(None, None,
                    "found unconstructable recursive node", node.start_mark)
        self.recursive_objects[node] = None
        constructor = None
        tag_suffix = None
        if node.tag in self.yaml_constructors:
            constructor = self.yaml_constructors[node.tag]
        else:
            for tag_prefix in self.yaml_multi_constructors:
                if tag_prefix is not None and node.tag.startswith(tag_prefix):
                    tag_suffix = node.tag[len(tag_prefix):]
                    constructor = self.yaml_multi_constructors[tag_prefix]
                    break
            else:
                if None in self.yaml_multi_constructors:
                    tag_suffix = node.tag
                    constructor = self.yaml_multi_constructors[None]
                elif None in self.yaml_constructors:
                    constructor = self.yaml_constructors[None]
                elif isinstance(node, ScalarNode):
                    constructor = self.__class__.construct_scalar
                elif isinstance(node, SequenceNode):
                    constructor = self.__class__.construct_sequence
                elif isinstance(node, MappingNode):
                    constructor = self.__class__.construct_mapping
        if tag_suffix is None:
            data = constructor(self, node) ##here line 100
        else:
            data = constructor(self, tag_suffix, node)
        if isinstance(data, types.GeneratorType):
            generator = data
            data = next(generator)
            if self.deep_construct:
                for dummy in generator:
                    pass
            else:
                self.state_generators.append(generator)
        self.constructed_objects[node] = data
        del self.recursive_objects[node]
        if deep:
            self.deep_construct = old_deep
        return data

    def construct_sequence(self, node, deep=False):
        if not isinstance(node, SequenceNode):
            raise ConstructorError(None, None,
                    "expected a sequence node, but found %s" % node.id,
                    node.start_mark)
        return [self.construct_object(child, deep=deep) ##here line 129
                for child in node.value]

    def construct_undefined(self, node):
        raise ConstructorError(None, None,
                "could not determine a constructor for the tag %r" % node.tag, ##here line 428 could not determine a constructor for the tag '!pytorch.NotCurrentlySupported' 
                node.start_mark)

I think that the code is not able to acquire the recipe. I downloaded the recipe from sparse zoo attached here:

<!--
Copyright (c) 2021 - present / Neuralmagic, Inc. All Rights Reserved.
Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at
   http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing,
software distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License.
-->

---
version: 1.1.0

modifiers:
  - !NotCurrentlySupported
---

# YOLOv5m6 one-shot pruned and finetuned -- COCO

The model was pruned using a one-shot algorithm and the recipe for reproducing it is not currently available.

## Evaluation

This model achieves 66.85 mAP@0.5 on the COCO dataset. The following command can be used to validate accuracy.

```bash
sparseml.yolov5.validation \
  --weights "zoo:cv/detection/yolov5-m6/pytorch/ultralytics/coco/pruned75-none" \
  --data coco.yaml \
  --imgsz 1280 \
  --iou-thres 0.65

and finally here is the stub of the recipe
`zoo:cv/detection/yolov5-m6/pytorch/ultralytics/coco/pruned75-none`
salwaghanim commented 1 year ago

I confirmed that the problem is missing recipe file. I will close this issue. if you want to sparsify a yolov5 model install using the following commands, this work on google collab and on a local machine. in local machine install cuda first then apply the following commands.

!pip install sparseml==1.4.4
!pip install sparseml[dev,torchvision,deepsparse,onnxruntime,transformers,yolov5,torch]
!sparseml.yolov5.train --help #this will download additional libraries

The last command is to reinstall opencv and other dependancies that were removed by the pype manager when It was installing the requirements of sparseml. for the best results on custom dataset train your YOLO model on your data before initiating the scarification process. This goes for both yolov5,and yolov8 ps train your model on the original framework I used ultralytics after training your model you can sparsify the best.pt weights file. make sure to select a recipe that have a .md file e,g, . zoo:cv/detection/yolov5-m6/pytorch/ultralytics/coco/pruned75-none best of luck