ultralytics / ultralytics

Ultralytics YOLO11 🚀
https://docs.ultralytics.com
GNU Affero General Public License v3.0
31.19k stars 6k forks source link

Freezing layer 'model.22.dfl.conv.weight' #5052

Closed oceanzhf closed 11 months ago

oceanzhf commented 1 year ago

Search before asking

Question

I don't have freeze training set, but when YOLOv8 prints the model parameters before training it shows Freezing layer 'model.22.dfl.conv.weight'

Additional

No response

EangGeen commented 1 year ago

我也遇到了这个问题,yolov5环境是可以正常使用gpu的,v8的torch.cuda.is_available()=True,应该也是可以用gpu的,但是拉下来代码好像不能用,torch=2.0.1+cu118,torchvision=0.15.2,cuda11.8,cudnn也下载了11.X的版本 Freezing layer 'model.22.dfl.conv.weight' AMP: running Automatic Mixed Precision (AMP) checks with YOLOv8n... Traceback (most recent call last): File "C:\Users\Acer\.conda\envs\yolov8.6\lib\runpy.py", line 196, in _run_module_as_main return _run_code(code, main_globals, None, File "C:\Users\Acer\.conda\envs\yolov8.6\lib\runpy.py", line 86, in _run_code exec(code, run_globals) File "C:\Users\Acer\.conda\envs\yolov8.6\Scripts\yolo.exe\__main__.py", line 7, in <module> File "C:\Users\Acer\.conda\envs\yolov8.6\lib\site-packages\ultralytics\cfg\__init__.py", line 445, in entrypoint getattr(model, mode)(**overrides) # default args from model File "C:\Users\Acer\.conda\envs\yolov8.6\lib\site-packages\ultralytics\engine\model.py", line 337, in train self.trainer.train() File "C:\Users\Acer\.conda\envs\yolov8.6\lib\site-packages\ultralytics\engine\trainer.py", line 195, in train self._do_train(world_size) File "C:\Users\Acer\.conda\envs\yolov8.6\lib\site-packages\ultralytics\engine\trainer.py", line 293, in _do_train self._setup_train(world_size) File "C:\Users\Acer\.conda\envs\yolov8.6\lib\site-packages\ultralytics\engine\trainer.py", line 239, in _setup_train self.amp = torch.tensor(check_amp(self.model), device=self.device) File "C:\Users\Acer\.conda\envs\yolov8.6\lib\site-packages\ultralytics\utils\checks.py", line 546, in check_amp assert amp_allclose(YOLO('yolov8n.pt'), im) File "C:\Users\Acer\.conda\envs\yolov8.6\lib\site-packages\ultralytics\utils\checks.py", line 534, in amp_allclose a = m(im, device=device, verbose=False)[0].boxes.data # FP32 inference File "C:\Users\Acer\.conda\envs\yolov8.6\lib\site-packages\ultralytics\engine\model.py", line 96, in __call__ return self.predict(source, stream, **kwargs) File "C:\Users\Acer\.conda\envs\yolov8.6\lib\site-packages\ultralytics\engine\model.py", line 236, in predict return self.predictor.predict_cli(source=source) if is_cli else self.predictor(source=source, stream=stream) File "C:\Users\Acer\.conda\envs\yolov8.6\lib\site-packages\ultralytics\engine\predictor.py", line 194, in __call__ return list(self.stream_inference(source, model, *args, **kwargs)) # merge list of Result into one File "C:\Users\Acer\.conda\envs\yolov8.6\lib\site-packages\torch\utils\_contextlib.py", line 35, in generator_context response = gen.send(None) File "C:\Users\Acer\.conda\envs\yolov8.6\lib\site-packages\ultralytics\engine\predictor.py", line 257, in stream_inference self.results = self.postprocess(preds, im, im0s) File "C:\Users\Acer\.conda\envs\yolov8.6\lib\site-packages\ultralytics\models\yolo\detect\predict.py", line 25, in postprocess preds = ops.non_max_suppression(preds, File "C:\Users\Acer\.conda\envs\yolov8.6\lib\site-packages\ultralytics\utils\ops.py", line 242, in non_max_suppression i = torchvision.ops.nms(boxes, scores, iou_thres) # NMS File "C:\Users\Acer\.conda\envs\yolov8.6\lib\site-packages\torchvision\ops\boxes.py", line 41, in nms return torch.ops.torchvision.nms(boxes, scores, iou_threshold) File "C:\Users\Acer\.conda\envs\yolov8.6\lib\site-packages\torch\_ops.py", line 502, in __call__ return self._op(*args, **kwargs or {}) NotImplementedError: Could not run 'torchvision::nms' with arguments from the 'CUDA' backend. This could be because the operator doesn't exist for this backend, or was omitted during t he selective/custom build process (if using custom build). If you are a Facebook employee using PyTorch on mobile, please visit https://fburl.com/ptmfixes for possible resolutions. 'to rchvision::nms' is only available for these backends: [CPU, QuantizedCPU, BackendSelect, Python, FuncTorchDynamicLayerBackMode, Functionalize, Named, Conjugate, Negative, ZeroTensor, A DInplaceOrView, AutogradOther, AutogradCPU, AutogradCUDA, AutogradXLA, AutogradMPS, AutogradXPU, AutogradHPU, AutogradLazy, AutogradMeta, Tracer, AutocastCPU, AutocastCUDA, FuncTorchBatched, FuncTorchVmapMode, Batched, VmapMode, FuncTorchGradWrapper, PythonTLSSnapshot, FuncTorchDynamicLayerFrontMode, PythonDispatcher].

rusvagzur commented 1 year ago

also interested

glenn-jocher commented 1 year ago

@rusvagzur hello,

Thanks for your interest and for reaching out.

The message "Freezing layer 'model.22.dfl.conv.weight'" is a standard information prompt and does not necessarily indicate an issue. The Ultralytics architecture employs a fine-tuning strategy where some layers may be "frozen" (i.e., their weights are not updated) during the initial stages of training to preserve the pre-trained weights. If you're not intending to freeze any layers and you're seeing this message, it could simply be a part of the model's default training procedure.

Regarding the error you're seeing during Automatic Mixed Precision (AMP) checks, it seems to be stemming from a compatibility issue between the versions of Torch, Torchvision and CUDA you have installed. The error indicates that the NMS operation, which comes from torchvision, can't be run on your current CUDA backend. The versions you have installed may not be compatible, causing this operation to fail.

Here's what you can check:

Be sure to match your software to the requirements and update as necessary. Let us know if this resolves your issue or if you need further assistance. Happy coding!

Best regards, Glenn Jocher

github-actions[bot] commented 11 months ago

👋 Hello there! We wanted to give you a friendly reminder that this issue has not had any recent activity and may be closed soon, but don't worry - you can always reopen it if needed. If you still have any questions or concerns, please feel free to let us know how we can help.

For additional resources and information, please see the links below:

Feel free to inform us of any other issues you discover or feature requests that come to mind in the future. Pull Requests (PRs) are also always welcomed!

Thank you for your contributions to YOLO 🚀 and Vision AI ⭐

JasseurHadded1 commented 9 months ago

But it's frozen even if the training is from scratch! In this case the weights of this layer will be just random no?

glenn-jocher commented 9 months ago

@JasseurHadded1 hello,

When training from scratch, it's unusual for layers to be frozen unless specified in the model configuration. If a layer is frozen unintentionally, it could indeed retain its initial random weights throughout training. Please ensure that your model configuration does not include commands to freeze layers if you wish to train all layers from scratch. If the issue persists, it might be a good idea to review the training script and configuration files for any unintended settings.

JasseurHadded1 commented 9 months ago

yes but it's frozen in all cases (it's independent of config parameters)

You can see it here,

https://github.com/ultralytics/ultralytics/blob/c26d4a222fa812c77fcba3cf904165f97f559a0f/ultralytics/engine/trainer.py#L223C15-L223C15

glenn-jocher commented 9 months ago

@JasseurHadded1,

Thank you for pointing this out. The line you're referring to is indeed part of the model's setup process. If the layer is being frozen regardless of the configuration, it could be an oversight or a default behavior that we need to address. We appreciate your vigilance and will review the code to ensure it aligns with the intended functionality. If a change is necessary, we will update the repository accordingly. Your feedback is valuable to the continuous improvement of YOLOv8.

Buckler89 commented 8 months ago

I'm currently attempting to train a YOLO model from scratch and have encountered log outputs similar to what was described in the aforementioned issue. Could you please confirm if this is the expected behavior or if there are any specific steps I should follow to address it?

Thank you for your assistance and looking forward to your response.

glenn-jocher commented 8 months ago

@Buckler89 hello,

When training a YOLO model from scratch, you should not typically see layers being frozen unless specified. The log output you're seeing may be part of a default setting that we need to investigate. For now, please ensure that your configuration does not explicitly freeze any layers. We will review the behavior and make necessary updates to the repository. Thank you for bringing this to our attention.

GoldFeniks commented 7 months ago

Hello! It looks like DFL layer in the implementation uses a convolution layer to perform the operation. The weights for the layer are initialized statically (since it's essentially an integral of some sort) and aren't supposed to be updated during training, so freezing DFL layers seems to be by design.

glenn-jocher commented 7 months ago

@GoldFeniks hello!

Yes, you're absolutely right! The DFL layer uses a convolution layer, and its weights are indeed initialized in a specific way to perform its intended operation effectively. These weights are meant to remain constant (frozen) during training, aligning with the design and purpose of the DFL layer. This approach helps in maintaining the integrity of the DFL operation throughout the training process. If you have any more questions or need further clarification, feel free to ask. Happy coding! 😊

Wooho-Moon commented 6 months ago

is approach helps in maintaining the integrity of the DFL operation throughout the training process. If you have any more questions or need further clarification, feel free to ask. Happy codin

Hi, guys! I have a quick question about DFL layer. DFL is "Distribution Focal Loss", right? But, I don't know what DFL exactly means. Could you explain about DFL?

glenn-jocher commented 6 months ago

@Wooho-Moon hello!

Absolutely, happy to explain! 😊 DFL stands for Distribution Focal Loss, which is an advanced loss function designed to handle the imbalance between the foreground and background classes in object detection tasks, especially in sparse and cluttered scenes. It does this by focusing more on hard-to-classify examples and less on easy ones, effectively improving the model's overall accuracy and robustness.

Instead of directly predicting the class probabilities, the DFL layer predicts a distribution over the classes. This approach allows the model to capture more detailed information about each prediction, especially in uncertain or ambiguous cases.

Hope this clarifies what DFL is all about! If you have any more questions, feel free to ask. Happy coding! 🚀

Wooho-Moon commented 6 months ago

@Wooho-Moon hello!

Absolutely, happy to explain! 😊 DFL stands for Distribution Focal Loss, which is an advanced loss function designed to handle the imbalance between the foreground and background classes in object detection tasks, especially in sparse and cluttered scenes. It does this by focusing more on hard-to-classify examples and less on easy ones, effectively improving the model's overall accuracy and robustness.

Instead of directly predicting the class probabilities, the DFL layer predicts a distribution over the classes. This approach allows the model to capture more detailed information about each prediction, especially in uncertain or ambiguous cases.

Hope this clarifies what DFL is all about! If you have any more questions, feel free to ask. Happy coding! 🚀

First, thanks for reply. However, I don't fully understand since the DFL layer is used with Ciou in yolov8. Under my understanding, when the box is regressed, DFL is used. According to your explain, DFL is used to slove the class imblance problem. It is totally a different function and that's why I confuse. I wanna know about relationship between DFL and box regression. Have a nice day :)

glenn-jocher commented 6 months ago

@Wooho-Moon hello again!

Thanks for following up, and I understand where the confusion might be coming from. Let me clarify it for you 😊.

In YOLOv8, the DFL layer indeed plays a crucial role in the box regression process, particularly when used alongside CIoU loss which focuses on improving the localization by calculating the similarity between the predicted and ground truth boxes.

The use of DFL in this context is slightly different from its traditional application for class imbalance. Here, it helps in enhancing the precision of bounding box predictions by encoding the distribution of possible box locations. This method allows the model to understand and optimize the location of objects in a more nuanced manner, leveraging the distribution of predictions rather than single-point estimates.

So, in essence, while DFL can address class imbalance, within YOLOv8, it also contributes significantly to refining the bounding box regression through distribution-based predictions. This dual application showcases the versatility of DFL within the model.

Hope this clears things up a bit more! If you have any more queries or need further details, feel free to reach out. Enjoy experimenting with YOLOv8! 🌟