megvii-research / FSCE

Apache License 2.0
275 stars 47 forks source link

Unable to reproduce the FSCE result of pascal voc split1 shot3 #10

Open Retiina opened 3 years ago

Retiina commented 3 years ago

Dear author, I am trying to reproduce your paper, but when I run the config of pascal voc split1 shot3, https://github.com/MegviiDetection/FSCE/blob/main/configs/PASCAL_VOC/split1/3shot_CL_IoU.yml, I get the result as follows:

[03/24 23:23:04 fsdet.evaluation.pascal_voc_evaluation]: Evaluating voc_2007_test_all1 using 2007 metric. Note that results do not use the official Matlab API.
[03/24 23:23:26 fsdet.evaluation.pascal_voc_evaluation]: Evaluate per-class mAP50:
|  aeroplane  |  bicycle  |  boat  |  bottle  |  car   |  cat   |  chair  |  diningtable  |  dog   |  horse  |  person  |  pottedplant  |  sheep  |  train  |  tvmonitor  |  bird  |  bus   |  cow   |  motorbike  |  sofa  |
|:-----------:|:---------:|:------:|:--------:|:------:|:------:|:-------:|:-------------:|:------:|:-------:|:--------:|:-------------:|:-------:|:-------:|:-----------:|:------:|:------:|:------:|:-----------:|:------:|
|   85.558    |  79.340   | 60.997 |  68.985  | 87.107 | 86.311 | 61.992  |    72.129     | 81.463 | 82.839  |  81.700  |    49.353     | 69.390  | 81.538  |   76.892    | 32.771 | 57.398 | 48.721 |   55.794    | 53.660 |
[03/24 23:23:26 fsdet.evaluation.pascal_voc_evaluation]: Evaluate overall bbox:
|   AP   |  AP50  |  AP75  |  bAP   |  bAP50  |  bAP75  |  nAP   |  nAP50  |  nAP75  |
|:------:|:------:|:------:|:------:|:-------:|:-------:|:------:|:-------:|:-------:|
| 39.492 | 68.697 | 40.156 | 43.895 | 75.040  | 45.338  | 26.281 | 49.669  | 24.610  |
[03/24 23:23:26 fsdet.engine.defaults]: Evaluation results for voc_2007_test_all1 in csv format:
[03/24 23:23:26 fsdet.evaluation.testing]: copypaste: Task: bbox
[03/24 23:23:26 fsdet.evaluation.testing]: copypaste: AP,AP50,AP75,bAP,bAP50,bAP75,nAP,nAP50,nAP75
[03/24 23:23:26 fsdet.evaluation.testing]: copypaste: 39.4919,68.6968,40.1564,43.8955,75.0396,45.3383,26.2812,49.6686,24.6105
[03/24 23:23:26 fsdet.utils.events]:  eta: 0:00:00  iter: 7999  total_loss: 0.3889  loss_cls: 0.09116  loss_box_reg: 0.08679  loss_contrast: 0.1952  loss_rpn_cls: 0.002133  loss_rpn_loc: 0.003993  time: 0.4631  data_time: 0.0475  lr: 0.00025  max_mem: 2057M
[03/24 23:23:26 fsdet.engine.hooks]: Overall training speed: 7996 iterations in 1:01:44 (0.4632 s / it)
[03/24 23:23:26 fsdet.engine.hooks]: Total training time: 1:46:03 (0:44:19 on hooks)

However, in paper the result is reported as 51.4, which is higher than my result 1.7% mAP:

Screen Shot 2021-03-24 at 11 29 31 PM

Could you check the config if the setting are totally the same with that in the paper? Or if I did something wrong, this is my training command:

python tools/train_net.py --config-file configs/PASCAL_VOC/split1/3shot_CL_IoU.yml --num-gpus 8 MODEL.WEIGHTS models/model_reset_surgery.pth

Thanks in advance!

bsun0802 commented 3 years ago

Which base training checkpoint (model_final.pth) you are using? You train it yourself or downloaded from FsDet.

Retiina commented 3 years ago

Download it from FsDet. http://dl.yf.io/fs-det/models/voc/split1/base_model/model_final.pth

Chauncy-Cai commented 3 years ago

image Maybe this yaml may help.

Because of such few instances and early-stop strategy to prevent overfitting, the unstable result is normal.

yhcao6 commented 3 years ago

Sir, thanks for your reply. I will try this config and will inform you when I get the result.

bsun0802 commented 3 years ago

@Retiina @yhcao6

Hi guys, will you be able to reproduce the nAP for 10shot PASCAL VOC split1?

yhcao6 commented 3 years ago

@bsun0802 Have not tried 10 shot but still can't reproduce the map for 1 shot and 3 shot of FSCE, but can reproduce the mAP of the improved TFA.

bsun0802 commented 3 years ago

those shots are unstable and can have large variance in different run.

please try if 5-shot and 10-shot can be reproduced, another thread finds not, if so, we need to revise what's wrong.

Thanks.

yhcao6 commented 3 years ago

Sure, I will try it now

Cuzny commented 3 years ago

What if I only use 1 gpu? Will this affect the result? In addition, the config file shows that the backbone seems to be trained. image However, during training, it shows image

yhcao6 commented 3 years ago

@bsun0802 , this is the result of split1 10shot of FSCE, still lower than paper

[03/29 16:10:55 fsdet.evaluation.pascal_voc_evaluation]: Evaluating voc_2007_test_all1 using 2007 metric. Note that results do not use the official Matlab API.
[03/29 16:11:13 fsdet.evaluation.pascal_voc_evaluation]: Evaluate per-class mAP50:
|  aeroplane  |  bicycle  |  boat  |  bottle  |  car   |  cat   |  chair  |  diningtable  |  dog   |  horse  |  person  |  pottedplant  |  sheep  |  train  |  tvmonitor  |  bird  |  bus   |  cow   |  motorbike  |  sofa  |
|:-----------:|:---------:|:------:|:--------:|:------:|:------:|:-------:|:-------------:|:------:|:-------:|:--------:|:-------------:|:-------:|:-------:|:-----------:|:------:|:------:|:------:|:-----------:|:------:|
|   85.873    |  85.576   | 66.554 |  67.776  | 87.936 | 88.366 | 63.925  |    64.852     | 85.325 | 85.267  |  78.948  |    49.353     | 76.907  | 85.293  |   77.127    | 41.074 | 75.418 | 68.892 |   68.620    | 54.369 |
[03/29 16:11:13 fsdet.evaluation.pascal_voc_evaluation]: Evaluate overall bbox:
|   AP   |  AP50  |  AP75  |  bAP   |  bAP50  |  bAP75  |  nAP   |  nAP50  |  nAP75  |
|:------:|:------:|:------:|:------:|:-------:|:-------:|:------:|:-------:|:-------:|
| 45.616 | 72.873 | 48.901 | 48.464 | 76.605  | 51.869  | 37.073 | 61.674  | 40.000  |
[03/29 16:11:13 fsdet.engine.defaults]: Evaluation results for voc_2007_test_all1 in csv format:
[03/29 16:11:13 fsdet.evaluation.testing]: copypaste: Task: bbox
[03/29 16:11:13 fsdet.evaluation.testing]: copypaste: AP,AP50,AP75,bAP,bAP50,bAP75,nAP,nAP50,nAP75
[03/29 16:11:13 fsdet.evaluation.testing]: copypaste: 45.6161,72.8726,48.9014,48.4638,76.6053,51.8685,37.0729,61.6745,40.0001
[03/29 16:11:13 fsdet.utils.events]:  eta: 0:00:00  iter: 14999  total_loss: 0.4749  loss_cls: 0.04555  loss_box_reg: 0.04276  loss_contrast: 0.3756  loss_rpn_cls: 0.002355  loss_rpn_loc: 0.004225  time: 0.4634  data_time: 0.0396  lr: 0.00025  max_mem: 2058M
[03/29 16:11:14 fsdet.engine.hooks]: Overall training speed: 14996 iterations in 1:55:51 (0.4635 s / it)
[03/29 16:11:14 fsdet.engine.hooks]: Total training time: 3:16:01 (1:20:10 on hooks)
bsun0802 commented 3 years ago

What if I only use 1 gpu? Will this affect the result? In addition, the config file shows that the backbone seems to be trained. image However, during training, it shows image

== 1 == I don't think 1 gpu can reproduce the same results. All experiments are performed on 8-gpus. == 2 == Resnet layers are frozen, FPN lateral and top-down convs are finetuned,

bsun0802 commented 3 years ago

@bsun0802 , there is the result of split1 10shot of FSCE, still lower than paper

[03/29 16:10:55 fsdet.evaluation.pascal_voc_evaluation]: Evaluating voc_2007_test_all1 using 2007 metric. Note that results do not use the official Matlab API.
[03/29 16:11:13 fsdet.evaluation.pascal_voc_evaluation]: Evaluate per-class mAP50:
|  aeroplane  |  bicycle  |  boat  |  bottle  |  car   |  cat   |  chair  |  diningtable  |  dog   |  horse  |  person  |  pottedplant  |  sheep  |  train  |  tvmonitor  |  bird  |  bus   |  cow   |  motorbike  |  sofa  |
|:-----------:|:---------:|:------:|:--------:|:------:|:------:|:-------:|:-------------:|:------:|:-------:|:--------:|:-------------:|:-------:|:-------:|:-----------:|:------:|:------:|:------:|:-----------:|:------:|
|   85.873    |  85.576   | 66.554 |  67.776  | 87.936 | 88.366 | 63.925  |    64.852     | 85.325 | 85.267  |  78.948  |    49.353     | 76.907  | 85.293  |   77.127    | 41.074 | 75.418 | 68.892 |   68.620    | 54.369 |
[03/29 16:11:13 fsdet.evaluation.pascal_voc_evaluation]: Evaluate overall bbox:
|   AP   |  AP50  |  AP75  |  bAP   |  bAP50  |  bAP75  |  nAP   |  nAP50  |  nAP75  |
|:------:|:------:|:------:|:------:|:-------:|:-------:|:------:|:-------:|:-------:|
| 45.616 | 72.873 | 48.901 | 48.464 | 76.605  | 51.869  | 37.073 | 61.674  | 40.000  |
[03/29 16:11:13 fsdet.engine.defaults]: Evaluation results for voc_2007_test_all1 in csv format:
[03/29 16:11:13 fsdet.evaluation.testing]: copypaste: Task: bbox
[03/29 16:11:13 fsdet.evaluation.testing]: copypaste: AP,AP50,AP75,bAP,bAP50,bAP75,nAP,nAP50,nAP75
[03/29 16:11:13 fsdet.evaluation.testing]: copypaste: 45.6161,72.8726,48.9014,48.4638,76.6053,51.8685,37.0729,61.6745,40.0001
[03/29 16:11:13 fsdet.utils.events]:  eta: 0:00:00  iter: 14999  total_loss: 0.4749  loss_cls: 0.04555  loss_box_reg: 0.04276  loss_contrast: 0.3756  loss_rpn_cls: 0.002355  loss_rpn_loc: 0.004225  time: 0.4634  data_time: 0.0396  lr: 0.00025  max_mem: 2058M
[03/29 16:11:14 fsdet.engine.hooks]: Overall training speed: 14996 iterations in 1:55:51 (0.4635 s / it)
[03/29 16:11:14 fsdet.engine.hooks]: Total training time: 3:16:01 (1:20:10 on hooks)

@yhcao6 This is the final checkpoint, did you checked the best checkpoint?

yhcao6 commented 3 years ago

@bsun0802 I checked the best nAP50 is 62.346

bsun0802 commented 3 years ago

@yhcao6 Seems odd. I would say above 62.6 should be easy to reach. We will have time to inspect that until this weekend.

yhcao6 commented 3 years ago

Thanks for taking your time to check it.

Chauncy-Cai commented 3 years ago

@yhcao6 Seems odd. I would say above 62.6 should be easy to reach.

image This is my rerun today. It does reach 62.5+ without any change.

Since few-shot task is not stable and data reported in paper is the best result in multiple runs, I think slight difference is normal.

Cuzny commented 3 years ago

@yhcao6 Seems odd. I would say above 62.6 should be easy to reach.

image This is my rerun today. It does reach 62.5+ without any change.

Since few-shot task is not stable and data reported in paper is the best result in multiple runs, I think slight difference is normal.

Is this the result on seed 0? thanks for your reply.

yhcao6 commented 3 years ago

@Chauncy-Cai Thanks for your reply. One possible reason may come from the randomness of surgery. If convenient would you like to upload your model_reset_surgery.pth?

Chen-Song commented 3 years ago

Yes, just as TFA , seed 0 actually is manually sampled. Thus, it always has the best result.

What does seed0 mean? In http://dl.yf.io/fs-det/datasets/vocsplit/, there no seed0 folder.

Chen-Song commented 3 years ago

image image

In Table1, the performance on 10-shot is 61.4 while in Table2, the result is 63.4 and the average over 10 random seeds is 59.7. These results are confusing.

Chen-Song commented 3 years ago

image

I use your base model and train it 'Stage 2: Fine-tune for novel data' with 4 gpus, but the results are much lower than the reported. I use the txt file of seed1 folder.

Chauncy-Cai commented 3 years ago

What does seed0 mean? In http://dl.yf.io/fs-det/datasets/vocsplit/, there no seed0 folder.

OK,I should describe more accurately. "http://dl.yf.io/fs-det/datasets/vocsplit/*.txt" instead of "seed0".

image image

In Table1, the performance on 10-shot is 61.4 while in Table2, the result is 63.4 and the average over 10 random seeds is 59.7. These results are confusing.

All experiment,except the average performance among 10 random seed in table 2, we have done is based on "http://dl.yf.io/fs-det/datasets/vocsplit/*.txt".

image

I use your base model and train it 'Stage 2: Fine-tune for novel data' with 4 gpus, but the results are much lower than the reported. I use the txt file of seed1 folder.

First, we get the result based on 8 gpus for training/finetune. Thus, we don't know the performance with 4 gpus. Moreover, nAP greatly depends on the finetune data you choose.

yunh-w commented 3 years ago

@Chauncy-Cai Can you tell me how to get 59.7 in over 10 random seeds? Just use the source code to train 10 times? Or edit this code? meta_pascal_voc.py line 69 split_dir = os.path.join(split_dir, "seed{}".format(1))

image

Chauncy-Cai commented 3 years ago

Just simply train the code with data in "seeds [1-10]" files in "http://dl.yf.io/fs-det/datasets/vocsplit/". You can change the train&test dataset in yaml directly. For instance, (coco_trainval_all_30shot) ->(coco_trainval_all_30shot_seed1) to use seed1 file.

yunh-w commented 3 years ago

@Chauncy-Cai Thanks for you reply !

yuyiwings commented 3 years ago

Recently, I used the original split-1 10-shot config and the base model downloaded from FsDet with 8 GPUs to train a model. But why can't I reproduce the result over 10 random seed? I only get bAP50 is 71.6 and nAP50 is 57.5. I use the final checkpoint but not the best checkpoint. Does the final model need to combine the base model classifier and the fine-tuned classifier? image Could you provide me a model you have trained to reach the expected result?

kike-0304 commented 2 years ago

How can I download txt files at one time from http://dl.yf.io/fs-det/datasets/vocsplit/ ? Do I need to copy them manually?

kike-0304 commented 2 years ago

如果我只使用 1 个 gpu 怎么办?这会影响结果吗? 此外,配置文件显示主干似乎已经过训练。 但是,在训练过程中,它显示 图片 图片

== 1 == 我不认为 1 gpu 可以重现相同的结果。所有实验均在 8-gpus 上进行。 == 2 == Resnet 层被冻结,FPN 横向和自上而下的 convs 被微调,

What if I only use 1 gpu? Will this affect the result? In addition, the config file shows that the backbone seems to be trained. image However, during training, it shows image

== 1 == I don't think 1 gpu can reproduce the same results. All experiments are performed on 8-gpus. == 2 == Resnet layers are frozen, FPN lateral and top-down convs are finetuned,

Why 1 gpu can not reproduce the same results? Can i get the same results with same batchsize and lr?

qjh666888 commented 2 months ago

如果我只使用 1 个 gpu 怎么办?这会影响结果吗? 此外,配置文件显示主干似乎已经过训练。 但是,在训练过程中,它显示 图片 图片

== 1 == 我不认为 1 gpu 可以重现相同的结果。所有实验均在 8-gpus 上进行。 == 2 == Resnet 层被冻结,FPN 横向和自上而下的 convs 被微调,

What if I only use 1 gpu? Will this affect the result?如果我只使用 1 个 GPU 怎么办?这会影响结果吗? In addition, the config file shows that the backbone seems to be trained.此外,配置文件显示主干似乎经过训练。 image However, during training, it shows但是,在训练期间,它显示 image

== 1 == I don't think 1 gpu can reproduce the same results. All experiments are performed on 8-gpus. == 2 == Resnet layers are frozen, FPN lateral and top-down convs are finetuned,== 1 == 我不认为 1 个 gpu 可以重现相同的结果。所有实验均在 8 个 GPU 上进行。== 2 == 冻结 Resnet 层,微调 FPN 横向和自上而下的转换,

Why 1 gpu can not reproduce the same results? Can i get the same results with same batchsize and lr?为什么 1 个 GPU 不能重现相同的结果?我可以用相同的批次大小和 lr 获得相同的结果吗? May I ask if you have successfully replicated one of your GUPs

qjh666888 commented 2 months ago

如果我只使用 1 个 gpu 怎么办?这会影响结果吗? 此外,配置文件显示主干似乎已经过训练。 但是,在训练过程中,它显示 图片 图片

== 1 == 我不认为 1 gpu 可以重现相同的结果。所有实验均在 8-gpus 上进行。 == 2 == Resnet 层被冻结,FPN 横向和自上而下的 convs 被微调,

What if I only use 1 gpu? Will this affect the result? In addition, the config file shows that the backbone seems to be trained. image However, during training, it shows image

== 1 == I don't think 1 gpu can reproduce the same results. All experiments are performed on 8-gpus. == 2 == Resnet layers are frozen, FPN lateral and top-down convs are finetuned,

Why 1 gpu can not reproduce the same results? Can i get the same results with same batchsize and lr?

May I ask if you have successfully replicated one of your GUPs