DNFPN for YOLO - Githubissues

hoiliu-0801 / DNTR

A DeNoising FPN with Transformer R-CNN for Tiny Object Detection

Apache License 2.0

29 stars 2 forks source link

DNFPN for YOLO #15

Open zzzrenn opened 1 week ago

zzzrenn commented 1 week ago

Could you please share the details for implementing DNFPN for YOLO? Specifically the following:

There are 3 layers in the top down path, discarding the top layer (as described in the code), I will be only considering the rest 2 layers?
The top down features and backbone features have the same number of channel, do I use the same self.channel_transfer layer for both of them or do I just keep them and use a separate encoder for each feature level?
Is the coefficient for the contrastive loss also set to 0.1?

It would be great if you could shed some light on my doubts or share the code/pseudo code for yolo.

Thank you very much.

hoiliu-0801 commented 1 week ago

Our YOLO+DN-FPN implementation is based on the Ultralytics YOLO repository. Due to licensing restrictions, we are unable to upload the code.

Yes.
A consistent geometric and semantic encoder is used across all layers.
The coefficient can be adjusted according to your YOLO model; we use a value of 0.1 for YOLOv8.

zzzrenn commented 1 week ago

Thank you for your clarification. To summarize that, the implementation for YOLO is as follows:

x_b = [8x downsampled backbone feature (layer 4), 16x backbone features (layer 6)], x = [8x topdown features (layer 12), 16x neck top-down features (layer 15)]
Use the same self.channel_transfer to project both the backbone and neck topdown features to have the same (256) channels
Use the same geometric and semantic encoder for all levels to compute the contrastive loss

Could you please confirm this?

Thank you very much

zzzrenn commented 1 week ago

@hoiliu-0801 Unfortunately, I could not get any positive result integrating DNFPN loss into YOLO. Could you please clarify the following:

What is the batch size used in the YOLO experiment? Is it 2 as mentioned in the paper?
How many epochs did you train without DNFPN loss to get a pre-trained model? And how many epochs did you fine-tune with the DNFPN loss with weight=0.1?
For the loss implementation, I realize you did not normalize it with batch size and scale size as opposed to the equation in the paper. Should I normalize it? And in YOLO repo, all the losses are multiplied by the batch size. Did you also do this for the DNFPN loss for the YOLO experiments?

It would be really helpful if you could clarify on these. Thank you very much for your time.

hoiliu-0801 commented 1 day ago

Yes, we use batch size=2 in our settings, but it can be increased if better GPUs are available to handle the additional computational demands.

To be honest, I don't recall the exact number for YOLO. However, I do remember that the YOLO model converged faster compared to DNTR in our experiments—possibly around 24 epochs for pretraining and 12 for fine-tuning. I’ll make an effort to reimplement the YOLO code in the future to confirm this. Apologies for any inconvenience caused.

We divide the total number of positive and negative queries for normalization. Is this the type of normalization you were referring to? Based on our experiments, I believe we did not multiply the losses by the batch size.