The hyper-parameter setting issue when reproducing CoOp

Kim-DKyu commented 1 month ago

First, thank you for sharing your wonderful work as an open source. I appreciate your work.

I attempted to reproduce the base-novel performance of the CoOp model on the ImageNet dataset as stated in the Maple paper using the script/coop/main.sh, but I did not achieve the reported results of CoOp base: 76.47% / novel: 67.88%.

Even when following the Co-CoOp setting as described in the paper, I was unable to reproduce the base-novel performance of CoOp.

The hyperparameter settings I used when training CoOp are as follows: (1) configs/trainers/CoOp/vit_b16.yaml

DATALOADER:
  TRAIN_X:
    BATCH_SIZE: 4
  TEST:
    BATCH_SIZE: 100
  NUM_WORKERS: 8

INPUT:
  SIZE: (224, 224)
  INTERPOLATION: "bicubic"
  PIXEL_MEAN: [0.48145466, 0.4578275, 0.40821073]
  PIXEL_STD: [0.26862954, 0.26130258, 0.27577711]
  TRANSFORMS: ["random_resized_crop", "random_flip", "normalize"]

OPTIM:
  NAME: "sgd"
  LR: 0.0035
  MAX_EPOCH: 10 #200
  LR_SCHEDULER: "cosine"
  WARMUP_EPOCH: 1
  WARMUP_TYPE: "constant"
  WARMUP_CONS_LR: 1e-5

TRAIN:
  PRINT_FREQ: 5

MODEL:
  BACKBONE:
    NAME: "ViT-B/16"

(2) scripts/coop/main.sh (The execution command)

bash scripts/coop/main.sh imagenet  vit_b16 middle 16 16 False
(DATASET)  (config file) (class token position) (number of context tokens) (number of shots) (class-specific context)

I'd like to know the hyperparameter settings for reproducing the CoOp base-novel performance as reported in the paper. Your response would be greatly appreciated. Thank you.

muzairkhattak commented 1 month ago

Dear @Kim-DKyu,

Thank you for showing interest in MaPLe!

Regarding your question, kindly note that we have reported the official CoOp results as mentioned in the CoCoOp paper. To clarfiy, we have not trained any model other than MaPLe in Table 3 of our paper. The result values for both CoOp and CoCoOp are obtained from the CoCoOp paper.

For the CoOp results reported in Table 2. We train CoOp for 10 epochs, append the context tokens at the end (not middle), use 4 context length with the learning rate of 0.0035.

If you want to reproduce the CoOp results (of Table 3), it will be recommended to follow the training details provided in the CoCoOp official paper (https://arxiv.org/abs/2203.05557) and repository (https://github.com/KaiyangZhou/CoOp).

I hope this is helpful. Kind regards!

Sincerely, Muhammad Uzair

Kim-DKyu commented 1 month ago

Thank you for your response. Have a great day

muzairkhattak / multimodal-prompt-learning

The hyper-parameter setting issue when reproducing CoOp #69