remaro-network / KD-YOLOX-ViT

This repository holds the code for the Python implementation of YOLOX-ViT. Furthermore, it has the implementation of the Knowledge Distillation (KD) method, evaluation metrics of the object detector and the side-scan sonar image dataset for underwater wall detection.
Apache License 2.0
41 stars 6 forks source link

FPN logit #3

Open Zzang-yeah opened 3 weeks ago

Zzang-yeah commented 3 weeks ago

When and where is the FPN logit stored? Whether online or offline, tools/demo.py just runs and generates an image, but no npy file is created, so I can't proceed with student learning.

Zzang-yeah commented 2 weeks ago

I fixed the above problem In yolo_head.py, on line 316, you need to add -f to yolox_command and expfile to yolox_command But I have another problem, the processes don't seem to run in order when multi-gpu. The npy file is not saved and I keep getting file not found after loading it. I think I need to modify the code to synchronize the processes.

martinaub commented 2 weeks ago

Hi @Zzang-yeah, thank you for your interest in our repo. To save images and FPN logits, you should ensure the parameters in both student and teacher files are correct. For instance, for saving the FPN teacher logits, you should set to true the self.KD parameter in the teacher file. In addition, for online KD, you should set to true the _self.KDonline and self.KD in the student file. In the case of online KD, the FPN and images are saved at every epoch and deleted after the KD loss is completed for the past epoch in a way that does not require too much space while training. About the multi-gpu training, I do not know since I am only using a single GPU. Let me know if it works with the proper parameters :)

Zzang-yeah commented 2 weeks ago

When learning with multi-GPU, the online learning didn't seem to work well because the order between processes was messed up. It was trying to load the npy file before creating the npy file, which caused a file not found error and training stopped. I am now switching to offline training and it is working fine.

I have one question: I understand that in online learning, the teacher model is augmented and saved logit for every iteration to KD train the student model, and in offline learning, the student model is KD trained by running Teacher_Inference.py with the teacher model and running augmentation and save logit only once. Doesn't this make any difference between online and offline learning? I ask because my guess is that online learning with KD at every iteration will perform better, but I don't remember it being mentioned in the paper.

martinaub commented 2 weeks ago

Thank you for raising this concern; I thought it would have been obvious, but maybe not. Because of computational power limitation, we introduced the offline KD, drastically reducing the time of KD training. Offline KD means that the model does not rely on the online data augmentation provided by the original YOLOX model. Instead, the model relies only on the pre-defined dataset. To highlight the difference in training between both methods (online data augmentation and non-online data augmentation), we show in our paper the no-Aug models, in which the metrics result without the data augmentation. By comparing the other model (e.g., YOLOX-L with YOLO-L-noAug), you will see a big difference in object detection metrics, where the L model is way better than the L-noAug. This result highlights the utility of online data augmentation during training and showcases that online KD would perform better than offline KD. However, online data augmentation is performed randomly while training the model. Thus, when launching the KD method, we do not know how will look like the online augmentated dataset for the training, resulting in launching the inference teacher at every iteration for each augmentated data.
In addition, you can still train the teacher with online data augmentation, and then the teacher model can transfer better knowledge into the student model.
Thus, offline KD does not perform as well as online KD; however, as shown in the result metrics from the paper, the model is still improved. If you have access to multiple GPUs, it should be faster for you to run the online KD.

Zzang-yeah commented 5 days ago

When comparing nano models trained with KD to nano models trained without, we found that the performance was not significantly better. Does the fact that the model with KD performed better mean that the FP was improved? In my experiments, the AP was higher for the nano model without KD than for the model with KD.