discrepancy between Detectron2 and Boxal

pkhateri commented 1 year ago

Hello Pieter, I have another inquiry that would require your expertise.

Recently, I conducted an experiment comparing the performance of two programs: the orginal Detectron2 (without AL) and Boxal. For Boxal, I set the initial_training_set to include all the annotated data and Loops=0. Basically, I tried to set all the parameters identical between the two programs. However, despite my effort, I observed a significant discrepancy in the results obtained from Detectron2 and Boxal. The results from Detectron2 turned out to be significantly better than those from Boxal.

I would greatly appreciate your insights on the potential reasons behind this discrepancy. To be more clear, I have included a summary of the parameters used and the corresponding loss outputs below. Parameters: Train set size: 360 Validation set size: 45 Test set size: 45

use_initial_train_dir: True # only for Boxal
network_config: "COCO-Detection/faster_rcnn_R_50_FPN_3x.yaml"
pretrained_weights: "COCO-Detection/faster_rcnn_R_50_FPN_3x.yaml"
classes: ['damaged']
transfer_learning_on_previous_models: True
learning_rate: 0.0005
warmup_iterations: 1000
train_iterations_base: 10000
train_iterations_step_size: 1000
step_image_number: 500
eval_period: 100
checkpoint_period: -1
weight_decay: 0.0001
learning_policy: 'steps_with_decay'
step_ratios: [0.5, 0.8]
gamma: 0.1
train_batch_size: 2
roi_heads_batch_size_per_img: 128
confidence_threshold: 0.5
nms_threshold: 0.05
strategy: 'uncertainty' # only for Boxal
mode: 'mean' # only for Boxal
initial_datasize: 360 # only for Boxal
pool_size: 1 # only for Boxal
loops: 0 # only for Boxal
dropout_probability: 0.25 # only for Boxal
mcd_iterations: 10 # only for Boxal
iou_thres: 0.95 # only for Boxal
incremental_learning: False # only for Boxal
sampling_percentage_per_subset: 10 # only for Boxal

Detectron2: loss_detectron Boxal: loss_boxal

pieterblok commented 1 year ago

That's a good catch! Indeed, this behavior is abnormal.

Are you 100% sure both training methods had the same learning rate?

BoxAL includes an automatic evaluation method during training to automatically save the best weights, yet that would not explain these loss curves.

By the way: how did you output the loss? As far as I remember, Detectron2 doesn't output loss (in its default version).

At the moment, I don't have a clue what's happening...

pkhateri commented 1 year ago

Yes, I set the same LR for both programs. I hope it is not changed in some hidden layers in the code.
That's right. I added a few classes to both programs to handle the validation loss output.
The goal of this test was to check if both programs have a similar behavior with the same setting or not. There must be still some differences in their implementation, I mean for the case of boxal running over one loop. For example, boxal has dropout probability and detectron2 does not. I decreased the dropout probability to almost zero and the train loss for boxal got closer to that of detectron2. See below:
If you could give me some hint, about the modifications implemented in your code and any possible parameters to tweak, that would be very helpful. For example, in your MaskAL paper, you talk about data augmentation. Are these extra compared to the default augmentation in the original detectron2 implementation?
Is there a way to turn off that "automatic evaluation method which during training automatically saves the best weights"? Or is it possible to deactivate it by commenting those lines out?

pkhateri commented 1 year ago

Here is an example of running boxal with the same paramaters, except for the number of loops (=5), number of initial_training images (=20) and pool_size (=68).

And the AP plots for the same setting:

And the AP plots for boxal with the earlier mentioned setting:

The AP plots for detectron2 with the earlier mentioned settings:

I was wondering if I read the evaluation values on the same dataset for both programs, I mean both results in the metrics files should be based on validation dataset.

pkhateri commented 1 year ago

@pieterblok Any comment on this? I would really appreciate your help to solve the issue. If you don't have the time to go in details, any hints on where to check for the problem would be incredibly helpful.

pieterblok commented 1 year ago

@pieterblok Any comment on this? I would really appreciate your help to solve the issue. If you don't have the time to go in details, any hints on where to check for the problem would be incredibly helpful.

I have to dive deeper into this, because at the moment I don't have a clue. Because I'm relocating at the moment, I don't have a computer and time to do this.

As a general answer: there have been so many things I have changed. Remember BoxAL was forked from MaskAL and if you look at the commits there, there has been a lot of small tweaks and changes. Adding dropout layers is definitely something that could have caused differences: https://github.com/pieterblok/boxal/blob/ed4b8915547b8ec7ea7e33d0b947f1136277071a/active_learning/strategies/dropout.py#L176

I have used the default data augmentation of Detectron2 v0.4, so that should be similar. For the weights, it's the easiest way to put a checkpoint number in the yaml file, for example 500 (this means that every 500 iterations a weights-file is stored). https://github.com/pieterblok/boxal/blob/ed4b8915547b8ec7ea7e33d0b947f1136277071a/boxal.yaml#L33

By looking at your validation loss, you can select another weights file stored in your directory.

That's all what I currently can think of.

pkhateri commented 1 year ago

Hi Pieter, Thanks very much for making time despite your current situation.

I disabled the dropout layers by using: FastRCNNConvFCHead, StandardROIHeads, Res5ROIHeads
instead of: FastRCNNConvFCHeadDropout, StandardROIHeadsDropout, Res5ROIHeadsDropout No change was observed in the results, compared to when I set the dropout probability to zero.
I also tried an older version of detectron2 (dated back to March 2021, when you forked MaskAL from detectron2). I thought maybe, because I am using a newer version of detectron2, I get a better results for detectron2. But, the old version works as good as the new version.
I have always set the default value for checkpoint_period, i.e checkpoint_period: -1 According to MISC_SETTINGS.md, this setting disables storing weights during training.

pieterblok / boxal

discrepancy between Detectron2 and Boxal #4