hoiliu-0801 / DNTR

A DeNoising FPN with Transformer R-CNN for Tiny Object Detection
Apache License 2.0
29 stars 2 forks source link

DNFPN for YOLO #15

Open zzzrenn opened 1 week ago

zzzrenn commented 1 week ago

Could you please share the details for implementing DNFPN for YOLO? Specifically the following:

It would be great if you could shed some light on my doubts or share the code/pseudo code for yolo.

Thank you very much.

hoiliu-0801 commented 1 week ago

Our YOLO+DN-FPN implementation is based on the Ultralytics YOLO repository. Due to licensing restrictions, we are unable to upload the code.

  1. Yes.
  2. A consistent geometric and semantic encoder is used across all layers.
  3. The coefficient can be adjusted according to your YOLO model; we use a value of 0.1 for YOLOv8.
zzzrenn commented 1 week ago

Thank you for your clarification. To summarize that, the implementation for YOLO is as follows:

  1. x_b = [8x downsampled backbone feature (layer 4), 16x backbone features (layer 6)], x = [8x topdown features (layer 12), 16x neck top-down features (layer 15)]
  2. Use the same self.channel_transfer to project both the backbone and neck topdown features to have the same (256) channels
  3. Use the same geometric and semantic encoder for all levels to compute the contrastive loss

Could you please confirm this?

Thank you very much

zzzrenn commented 1 week ago

@hoiliu-0801 Unfortunately, I could not get any positive result integrating DNFPN loss into YOLO. Could you please clarify the following:

It would be really helpful if you could clarify on these. Thank you very much for your time.

hoiliu-0801 commented 1 day ago

Yes, we use batch size=2 in our settings, but it can be increased if better GPUs are available to handle the additional computational demands.

To be honest, I don't recall the exact number for YOLO. However, I do remember that the YOLO model converged faster compared to DNTR in our experiments—possibly around 24 epochs for pretraining and 12 for fine-tuning. I’ll make an effort to reimplement the YOLO code in the future to confirm this. Apologies for any inconvenience caused.

We divide the total number of positive and negative queries for normalization. Is this the type of normalization you were referring to? Based on our experiments, I believe we did not multiply the losses by the batch size.