ultralytics / ultralytics

NEW - YOLOv8 🚀 in PyTorch > ONNX > OpenVINO > CoreML > TFLite
https://docs.ultralytics.com
GNU Affero General Public License v3.0
23.75k stars 4.74k forks source link

RT-DETR #11687

Open FanglinLiu1 opened 1 week ago

FanglinLiu1 commented 1 week ago

Search before asking

Question

1.Thanks for the author's reply, I really want to know why the setting of hyperparameters does not follow the original author, for example, batch is set to 4, workers is set to 4.

I would also like to know why the following two Settings do not align the pytorch version of RT-DETR, The parameter max norm in torch.nn.utils.clip grad norm is not 0.1, Ultralytics/engine/trainer.

The self.args.nbs of the _setup_train function in ultralytics/engine/trainer.py is not equal to self.batch_size.

2.Whether VOC and COCO data set authors provide standard txt annotation files.

3.Whether the author provides data enhancement programs.

4.Whether your hyperparameter Settings are optimal.

Additional

No response

glenn-jocher commented 1 week ago

@FanglinLiu1 hello!

Thanks for reaching out with these thoughtful questions about the settings and implementations in YOLOv8, specifically concerning RT-DETR.

  1. Hyperparameter settings: Changes from the original hyperparameter settings by the original authors are adjusted based on empirical results for performance enhancements and implementation compatibility. Setting batch=4, workers=4 and other parameter adjustments often stem from optimizations found through extensive testing, to balance training speed and system resource utilization effectively.

    Differences in PyTorch version of RT-DETR: We adjust parameters like max_norm in clipping gradients to fit the specific demands and architecture variances within our framework. Similarly, self.args.nbs not being equal to self.batch_size allows for dynamic adjustments improving training stability and performance based on real-time calculations.

  2. Annotation files for VOC and COCO: The Ultralytics implementation automatically handles and converts annotations during dataset preparation, so separate txt files aren't generally provided explicitly. However, you can extract these during preprocessing if needed.

  3. Data augmentation: Yes, our framework includes extensive data augmentation techniques integrated directly into the training pipeline. These include but are not limited to image resizing, random cropping, color space transformations, and flipping.

  4. Hyperparameter optimization: The provided hyperparameters are a solid baseline that delivers good performance on a wide range of datasets. We encourage users to fine-tune these settings based on the specifics of their tasks and datasets to achieve the best results.

For detailed implementation adjustments or more specific guidance, you might consider exploring the source code or engaging further in our community discussions. Hope this helps! 😊

Best regards!