About the detailed setting of Table 1

Hi @sIncerass @bronyayang

Thanks for the great work. I tried to reproduce Table 1 but cannot get the exact same results so I want to confirm with you.

What I did is:

Download and extract the pretrained prompts you shared with me
Run 1-shot single task adaption with MCoOp based on the pretrained prompts by changing --model-dir, like: bash scripts/mvlpt/main_single_elevater_cut.sh CoOp vit_b16 16 1 1 resisc45_clip bash scripts/mvlpt/main_single_elevater_cut.sh CoOp vit_b16 16 1 2 resisc45_clip bash scripts/mvlpt/main_single_elevater_cut.sh CoOp vit_b16 16 1 3 resisc45_clip
I got the accuracy 60.38 on mnist and 65.86 on resisc45 (3 times averaging). But according to Table 1, these should be 65.06 and 67.39.

Could you please help me to check which part I did is not consistent with your side?

Here is one of my log files:

***************
** Arguments **
***************
act_ckpt: 1
backbone: 
config_file: configs/trainers/MVLPT/vit_b16.yaml
cut_contextlen: False
dataset: resisc45_clip
dataset_config_file: 
dataset_coop: False
eval_only: False
head: 
load_epoch: None
model_dir: ./output/ImageNet,Caltech101,Food101,StanfordCars,OxfordPets,OxfordFlowers,FGVCAircraft,SUN397,DescribableTextures,EuroSAT,UCF101/CoOp/vit_b16_1shots/nctx16_csc_ctp/
multi_task: False
multi_task_evalkey: average
multi_task_label_pertask: False
no_train: False
opts: ['TRAINER.MVLPT.VPT.N_CTX', '0', 'TRAINER.MVLPT.COOP.N_CTX', '16', 'TRAINER.MVLPT.COOP.CLASS_TOKEN_POSITION', 'middle', 'TRAINER.MVLPT.COOP.CSC', 'False', 'TEST.NO_TEST', 'False', 'TEST.FINAL_MODEL', 'best_val', 'TRAINER.CUT_CONTEXTLEN', 'True']
output_dir: ./output/resisc45_clip/CoOp/vit_b16_1shots/nctx16_csc_ctp/seed3
resume: 
root: ../CoOpData
seed: 3
shots: 1
source_domains: None
target_domains: None
trainer: MVLPT
transforms: None
************
** Config **
************
DATALOADER:
  K_TRANSFORMS: 1
  NUM_WORKERS: 8
  RETURN_IMG0: False
  TEST:
    BATCH_SIZE: 100
    SAMPLER: SequentialSampler
  TRAIN_U:
    BATCH_SIZE: 32
    N_DOMAIN: 0
    N_INS: 16
    SAME_AS_X: True
    SAMPLER: RandomSampler
  TRAIN_X:
    BATCH_SIZE: 32
    N_DOMAIN: 0
    N_INS: 16
    SAMPLER: RandomSampler
DATASET:
  ALL_AS_UNLABELED: False
  CENTER_CROP: False
  CIFAR_C_LEVEL: 1
  CIFAR_C_TYPE: 
  COOP: False
  DATASET: resisc45_clip
  MULTITASK: False
  MULTITASK_EVALKEY: average
  MULTITASK_LABEL_PERTASK: False
  NAME: 
  NUM_LABELED: -1
  NUM_SAMPLES_PER_CLASS: 1
  NUM_SHOTS: 1
  RANDOM_SEED_SAMPLING: 3
  ROOT: ../CoOpData
  SOURCE_DOMAINS: ()
  STL10_FOLD: -1
  SUBSAMPLE_CLASSES: all
  TARGET_DOMAINS: ()
  TEST_SET: val
  TRAIN_SET: train
  VAL_PERCENT: 0.1
  VAL_SET: 
INPUT:
  COLORJITTER_B: 0.4
  COLORJITTER_C: 0.4
  COLORJITTER_H: 0.1
  COLORJITTER_S: 0.4
  CROP_PADDING: 4
  CUTOUT_LEN: 16
  CUTOUT_N: 1
  GB_K: 21
  GB_P: 0.5
  GN_MEAN: 0.0
  GN_STD: 0.15
  INTERPOLATION: bicubic
  NO_TRANSFORM: False
  PIXEL_MEAN: [0.48145466, 0.4578275, 0.40821073]
  PIXEL_STD: [0.26862954, 0.26130258, 0.27577711]
  RANDAUGMENT_M: 10
  RANDAUGMENT_N: 2
  RGS_P: 0.2
  RRCROP_SCALE: (0.08, 1.0)
  SIZE: (224, 224)
  TRANSFORMS: ('random_resized_crop', 'random_flip', 'normalize')
MODEL:
  BACKBONE:
    NAME: ViT-B/16
    PRETRAINED: True
  HEAD:
    ACTIVATION: relu
    BN: True
    DROPOUT: 0.0
    HIDDEN_LAYERS: ()
    NAME: 
  INIT_WEIGHTS: 
OPTIM:
  ADAM_BETA1: 0.9
  ADAM_BETA2: 0.999
  BASE_LR_MULT: 0.1
  GAMMA: 0.1
  LR: 0.002
  LR_SCHEDULER: cosine
  MAX_EPOCH: 200
  MOMENTUM: 0.9
  NAME: sgd
  NEW_LAYERS: ()
  RMSPROP_ALPHA: 0.99
  SGD_DAMPNING: 0
  SGD_NESTEROV: False
  STAGED_LR: False
  STEPSIZE: (-1,)
  WARMUP_CONS_LR: 1e-05
  WARMUP_EPOCH: 1
  WARMUP_MIN_LR: 1e-05
  WARMUP_RECOUNT: True
  WARMUP_TYPE: constant
  WEIGHT_DECAY: 0.0005
OUTPUT_DIR: ./output/resisc45_clip/CoOp/vit_b16_1shots/nctx16_csc_ctp/seed3
RESUME: 
SEED: 3
TEST:
  COMPUTE_CMAT: False
  EVALUATOR: Classification
  FINAL_MODEL: best_val
  NO_TEST: False
  PER_CLASS_RESULT: False
  SPLIT: test
TRAIN:
  CHECKPOINT_FREQ: 0
  COUNT_ITER: train_x
  PRINT_FREQ: 5
TRAINER:
  ACT_CKPT: 1
  CDAC:
    CLASS_LR_MULTI: 10
    P_THRESH: 0.95
    RAMPUP_COEF: 30
    RAMPUP_ITRS: 1000
    STRONG_TRANSFORMS: ()
    TOPK_MATCH: 5
  COCOOP:
    CTX_INIT: 
    N_CTX: 16
    PREC: fp16
  COOP:
    CLASS_TOKEN_POSITION: end
    CSC: False
    CTX_INIT: 
    N_CTX: 16
    PREC: fp16
  CROSSGRAD:
    ALPHA_D: 0.5
    ALPHA_F: 0.5
    EPS_D: 1.0
    EPS_F: 1.0
  CUT_CONTEXTLEN: True
  DAEL:
    CONF_THRE: 0.95
    STRONG_TRANSFORMS: ()
    WEIGHT_U: 0.5
  DAELDG:
    CONF_THRE: 0.95
    STRONG_TRANSFORMS: ()
    WEIGHT_U: 0.5
  DDAIG:
    ALPHA: 0.5
    CLAMP: False
    CLAMP_MAX: 1.0
    CLAMP_MIN: -1.0
    G_ARCH: 
    LMDA: 0.3
    WARMUP: 0
  DOMAINMIX:
    ALPHA: 1.0
    BETA: 1.0
    TYPE: crossdomain
  ENTMIN:
    LMDA: 0.001
  FIXMATCH:
    CONF_THRE: 0.95
    STRONG_TRANSFORMS: ()
    WEIGHT_U: 1.0
  M3SDA:
    LMDA: 0.5
    N_STEP_F: 4
  MCD:
    N_STEP_F: 4
  MEANTEACHER:
    EMA_ALPHA: 0.999
    RAMPUP: 5
    WEIGHT_U: 1.0
  MIXMATCH:
    MIXUP_BETA: 0.75
    RAMPUP: 20000
    TEMP: 2.0
    WEIGHT_U: 100.0
  MME:
    LMDA: 0.1
  MVLPT:
    COOP:
      CLASS_TOKEN_POSITION: middle
      CSC: False
      CTX_INIT: 
      N_CTX: 16
    PREC: fp16
    PROJECT_DIM: 128
    PROJECT_METHOD: transformer
    VPT:
      CSC: False
      CTX_INIT: 
      DEEP: True
      DROPOUT: 0.0
      N_CTX: 0
      PROJECT: -1
  NAME: MVLPT
  SE:
    CONF_THRE: 0.95
    EMA_ALPHA: 0.999
    RAMPUP: 300
USE_CUDA: True
VERBOSE: True
VERSION: 1
Collecting env info ...
** System info **
PyTorch version: 1.10.2+cu102
Is debug build: False
CUDA used to build PyTorch: 10.2
ROCM used to build PyTorch: N/A

OS: Ubuntu 18.04.5 LTS (x86_64)
GCC version: (Ubuntu 7.5.0-3ubuntu1~18.04) 7.5.0
Clang version: Could not collect
CMake version: Could not collect
Libc version: glibc-2.17

Python version: 3.7.11 (default, Jul 27 2021, 14:32:16)  [GCC 7.5.0] (64-bit runtime)
Python platform: Linux-5.4.0-81-generic-x86_64-with-debian-buster-sid
Is CUDA available: True
CUDA runtime version: 10.0.130
GPU models and configuration: GPU 0: GeForce RTX 2080 Ti
Nvidia driver version: 460.32.03
cuDNN version: Probably one of the following:
/usr/lib/x86_64-linux-gnu/libcudnn.so.8.2.2
/usr/lib/x86_64-linux-gnu/libcudnn_adv_infer.so.8.2.2
/usr/lib/x86_64-linux-gnu/libcudnn_adv_train.so.8.2.2
/usr/lib/x86_64-linux-gnu/libcudnn_cnn_infer.so.8.2.2
/usr/lib/x86_64-linux-gnu/libcudnn_cnn_train.so.8.2.2
/usr/lib/x86_64-linux-gnu/libcudnn_ops_infer.so.8.2.2
/usr/lib/x86_64-linux-gnu/libcudnn_ops_train.so.8.2.2
HIP runtime version: N/A
MIOpen runtime version: N/A

Versions of relevant libraries:
[pip3] numpy==1.19.2
[pip3] pytorch-lightning==1.4.0
[pip3] torch==1.10.2
[pip3] torchaudio==0.10.2
[pip3] torchmetrics==0.6.0
[pip3] torchtext==0.5.0
[pip3] torchvision==0.11.3
[conda] numpy                     1.19.2                   pypi_0    pypi
[conda] pytorch-lightning         1.4.0                    pypi_0    pypi
[conda] torch                     1.10.2                   pypi_0    pypi
[conda] torchaudio                0.10.2                   pypi_0    pypi
[conda] torchmetrics              0.6.0                    pypi_0    pypi
[conda] torchtext                 0.5.0                    pypi_0    pypi
[conda] torchvision               0.11.3                   pypi_0    pypi
        Pillow (8.3.1)

Loading trainer: MVLPT
Loading CLIP (backbone: ViT-B/16)
Building custom CLIP
Initializing a generic context
COOP Initial context: "X X X X X X X X X X X X X X X X"
COOP Number of context words (tokens): 16
Current Context Length is:  22
Turning off gradients in both the image and the text encoder
prompt_learner.ctx torch.Size([16, 512])
Tunable Param: 0.008192M, Original CLIP 124.323841M
Loading evaluator: Classification
Loading weights to prompt_learner from "./output/ImageNet,Caltech101,Food101,StanfordCars,OxfordPets,OxfordFlowers,FGVCAircraft,SUN397,DescribableTextures,EuroSAT,UCF101/CoOp/vit_b16_1shots/nctx16_csc_ctp/prompt_learner/model-best.pth.tar" (epoch = 76)
No checkpoint found, train from scratch

I also tried CoOp, VPT, UPT, MVPT and MUPT. None of them have the same number as reported in Table1 (with ~5% lower). I guess there must be some wrong configuration in my experiments. I would really appreciate it if you can response.

Hi, While I am testing Table 1, I wonder if you can get the same performance for Table 2? The config looks right to me, so it takes some time for me to test the code. Sorry for the late response!

Hi @bronyayang

Many thanks for your reply. I haven't tested Table 2 since I don't have prompt initialization on ELEVATER (I need to do pretraining). I'll try to reproduce it and will let you know the results. Thanks!

I did some quick experiments on CoOp and UPT on MNIST without prompt initialization (the second group in Table 2), since they don't need pretraining.

The commands I used are: bash scripts/mvlpt/main_single_elevater_cut.sh CoOp vit_b16 16 20 1 mnist bash scripts/mvlpt/main_single_elevater_cut.sh CoOp vit_b16 16 20 2 mnist bash scripts/mvlpt/main_single_elevater_cut.sh CoOp vit_b16 16 20 3 mnist where I delete --model_dir to not load pretrained prompts.

I got 89.42 with CoOp and 89.37 with UPT. But in Table 2, they should be 91.44 and 89.11. So there is also something mismatched between your implementation and mine for Table 2.

Hi @bronyayang @sIncerass ,

I found one possible reason that might lead to the mismatched result -- the pretrained static_dict has different keys from the current model. It seems like previously MVLPT trainer is named after UPT. Here is the screenshot of your log (left) and mine (right) for : bash scripts/mvlpt/main_mt_coop_cut.sh UPT vit_b16 4 1

screenshot

Hi @bronyayang @sIncerass ,

I found one possible reason that might lead to the mismatched result -- the pretrained static_dict has different keys from the current model. It seems like previously MVLPT trainer is named after UPT. Here is the screenshot of your log (left) and mine (right) for : bash scripts/mvlpt/main_mt_coop_cut.sh UPT vit_b16 4 1

Yes, this definitely is a problem. Thank you for sharing. I am fixing now. 🥹

I tried to fix this issue and added state_dict = {k.replace("upt_proj", "mvlpt_proj"):v for k, v in state_dict.items()} at https://github.com/sIncerass/MVLPT/blob/main/trainers/mvlpt.py#L1005

Unfortunately, I still cannot get the number reported in Table 1 and the performance even drops a little bit on MNIST (compared with this bug existing)... This is really weird. Could you please also share the log files of the prompt adaption part from your side?

I tried to fix this issue and added state_dict = {k.replace("upt_proj", "mvlpt_proj"):v for k, v in state_dict.items()} at https://github.com/sIncerass/MVLPT/blob/main/trainers/mvlpt.py#L1005

Unfortunately, I still cannot get the number reported in Table 1 and the performance even drops a little bit on MNIST (compared with this bug existing)... This is really weird. Could you please also share the log files of the prompt adaption part from your side?

I tried printing state_dict.items(). I can only see ctx, token_prefix, and token_suffix for keys, so I guess a replace would not work? I need to double check with the other author to see if the checkpoint is correct.

I was testing the averaged ckpt of 1-shot UPT trained on CoOp, I can found the other keys, like: screenshot

I was testing the averaged ckpt of 1-shot UPT trained on CoOp, I can found the other keys, like:

I see, then the mismatch must be other bugs. I was testing on CoOp checkpoints and it does not have these keys, so the initial mismatch on training CoOp 1 shot is not a loading problem...

I did some quick experiments on CoOp and UPT on MNIST without prompt initialization (the second group in Table 2), since they don't need pretraining.

The commands I used are: bash scripts/mvlpt/main_single_elevater_cut.sh CoOp vit_b16 16 20 1 mnist bash scripts/mvlpt/main_single_elevater_cut.sh CoOp vit_b16 16 20 2 mnist bash scripts/mvlpt/main_single_elevater_cut.sh CoOp vit_b16 16 20 3 mnist where I delete --model_dir to not load pretrained prompts.

I got 89.42 with CoOp and 89.37 with UPT. But in Table 2, they should be 91.44 and 89.11. So there is also something mismatched between your implementation and mine for Table 2.

I tried these 3 command and the average is 91.98. I think for table 2, 1 point fluctuation is expected...and we should release standard deviation for few shot experiments soon.

I am running experiments on 1 TITAN RTX, cuda 11.3.

I will rerun Table 1 next.

Hi, @bronyayang

I tried to run multi-task prompt initialization on CoOp datasets from scratch (rather than using your provided pre-trained weights) and used it as the single-task prompt adaption for ELEVATER datasets. Almost every number I got is lower than what you reported with 2~3% drop.

However, I can confirm that the relative performance between each method is matched (eg, my MUPT and UPT have a similar performance gap as yours). So I believe the main conclusion of your paper is correct and the only thing is about the original configuration used for table 1.

Hi @bronyayang @BrandonHanx ,

Could you share how to get and organize the data target data (ELEVATER)? I follow the instruction from https://github.com/sIncerass/MVLPT. It said

Note that the dataset for target ELEVATER benchmark will be downloaded automatically in MVLPT/trainers/vision_benchmark/.

NO (image) data in that directory just the exact same when it downloaded and I had to download them using the code from ELEVATER github page.

Then I tried the cmd line bash scripts/mvlpt/main_single_coopdata_cut.sh CoOp vit_b16 16 1 1 resisc45_clip and the data error shown as below

bash scripts/mvlpt/main_single_coopdata_cut.sh CoOp vit_b16 16 1 1 resisc45_clip
[nltk_data] Downloading package punkt to /home/stly/nltk_data...
[nltk_data]   Package punkt is already up-to-date!
[nltk_data] Downloading package wordnet to /home/stly/nltk_data...
[nltk_data]   Package wordnet is already up-to-date!
/home/stly/Data/HULA/projects/prompting/MVLPT/clip/clip.py:23: UserWarning: PyTorch version 1.7.1 or higher is recommended
  warnings.warn("PyTorch version 1.7.1 or higher is recommended")
Setting fixed seed: 1
***************
** Arguments **
***************
act_ckpt: 1
backbone: 
config_file: configs/trainers/MVLPT/vit_b16.yaml
cut_contextlen: False
dataset: resisc45_clip
dataset_config_file: 
dataset_coop: True
eval_only: True
head: 
load_epoch: 200
model_dir: /home/stly/Data/HULA/projects/prompting/logs/medmnist/ImageNet,Caltech101,Food101,StanfordCars,OxfordPets,OxfordFlowers,FGVCAircraft,SUN397,DescribableTextures,EuroSAT,UCF101/CoOp/vit_b16_1shots/nctx16_csc_ctp/
multi_task: False
multi_task_evalkey: average
multi_task_label_pertask: False
no_train: False
opts: ['TRAINER.MVLPT.VPT.N_CTX', '0', 'TRAINER.MVLPT.COOP.N_CTX', '16', 'TRAINER.MVLPT.COOP.CLASS_TOKEN_POSITION', 'middle', 'TRAINER.MVLPT.COOP.CSC', 'False', 'TEST.NO_TEST', 'False', 'TEST.FINAL_MODEL', 'best_val', 'TRAINER.CUT_CONTEXTLEN', 'True']
output_dir: /home/stly/Data/HULA/projects/prompting/logs/medmnist/resisc45_clip/CoOp/vit_b16_1shots/nctx16_csc_ctp/seed1
resume: 
root: /home/stly/Data/HULA/projects/dataset/DataDownload/classification/data/classification
seed: 1
shots: 1
source_domains: None
target_domains: None
trainer: MVLPT
transforms: None
************
** Config **
************
DATALOADER:
  K_TRANSFORMS: 1
  NUM_WORKERS: 8
  RETURN_IMG0: False
  TEST:
    BATCH_SIZE: 100
    SAMPLER: SequentialSampler
  TRAIN_U:
    BATCH_SIZE: 32
    N_DOMAIN: 0
    N_INS: 16
    SAME_AS_X: True
    SAMPLER: RandomSampler
  TRAIN_X:
    BATCH_SIZE: 32
    N_DOMAIN: 0
    N_INS: 16
    SAMPLER: RandomSampler
DATASET:
  ALL_AS_UNLABELED: False
  CENTER_CROP: False
  CIFAR_C_LEVEL: 1
  CIFAR_C_TYPE: 
  COOP: True
  DATASET: resisc45_clip
  MULTITASK: False
  MULTITASK_EVALKEY: average
  MULTITASK_LABEL_PERTASK: False
  NAME: 
  NUM_LABELED: -1
  NUM_SAMPLES_PER_CLASS: 1
  NUM_SHOTS: 1
  RANDOM_SEED_SAMPLING: 1
  ROOT: /home/stly/Data/HULA/projects/dataset/DataDownload/classification/data/classification
  SOURCE_DOMAINS: ()
  STL10_FOLD: -1
  SUBSAMPLE_CLASSES: all
  TARGET_DOMAINS: ()
  TEST_SET: val
  TRAIN_SET: train
  VAL_PERCENT: 0.1
  VAL_SET: 
INPUT:
  COLORJITTER_B: 0.4
  COLORJITTER_C: 0.4
  COLORJITTER_H: 0.1
  COLORJITTER_S: 0.4
  CROP_PADDING: 4
  CUTOUT_LEN: 16
  CUTOUT_N: 1
  GB_K: 21
  GB_P: 0.5
  GN_MEAN: 0.0
  GN_STD: 0.15
  INTERPOLATION: bicubic
  NO_TRANSFORM: False
  PIXEL_MEAN: [0.48145466, 0.4578275, 0.40821073]
  PIXEL_STD: [0.26862954, 0.26130258, 0.27577711]
  RANDAUGMENT_M: 10
  RANDAUGMENT_N: 2
  RGS_P: 0.2
  RRCROP_SCALE: (0.08, 1.0)
  SIZE: (224, 224)
  TRANSFORMS: ('random_resized_crop', 'random_flip', 'normalize')
MODEL:
  BACKBONE:
    NAME: ViT-B/16
    PRETRAINED: True
  HEAD:
    ACTIVATION: relu
    BN: True
    DROPOUT: 0.0
    HIDDEN_LAYERS: ()
    NAME: 
  INIT_WEIGHTS: 
OPTIM:
  ADAM_BETA1: 0.9
  ADAM_BETA2: 0.999
  BASE_LR_MULT: 0.1
  GAMMA: 0.1
  LR: 0.002
  LR_SCHEDULER: cosine
  MAX_EPOCH: 200
  MOMENTUM: 0.9
  NAME: sgd
  NEW_LAYERS: ()
  RMSPROP_ALPHA: 0.99
  SGD_DAMPNING: 0
  SGD_NESTEROV: False
  STAGED_LR: False
  STEPSIZE: (-1,)
  WARMUP_CONS_LR: 1e-05
  WARMUP_EPOCH: 1
  WARMUP_MIN_LR: 1e-05
  WARMUP_RECOUNT: True
  WARMUP_TYPE: constant
  WEIGHT_DECAY: 0.0005
OUTPUT_DIR: /home/stly/Data/HULA/projects/prompting/logs/medmnist/resisc45_clip/CoOp/vit_b16_1shots/nctx16_csc_ctp/seed1
RESUME: 
SEED: 1
TEST:
  COMPUTE_CMAT: False
  EVALUATOR: Classification
  FINAL_MODEL: best_val
  NO_TEST: False
  PER_CLASS_RESULT: False
  SPLIT: test
TRAIN:
  CHECKPOINT_FREQ: 0
  COUNT_ITER: train_x
  PRINT_FREQ: 5
TRAINER:
  ACT_CKPT: 1
  CDAC:
    CLASS_LR_MULTI: 10
    P_THRESH: 0.95
    RAMPUP_COEF: 30
    RAMPUP_ITRS: 1000
    STRONG_TRANSFORMS: ()
    TOPK_MATCH: 5
  COCOOP:
    CTX_INIT: 
    N_CTX: 16
    PREC: fp16
  COOP:
    CLASS_TOKEN_POSITION: end
    CSC: False
    CTX_INIT: 
    N_CTX: 16
    PREC: fp16
  CROSSGRAD:
    ALPHA_D: 0.5
    ALPHA_F: 0.5
    EPS_D: 1.0
    EPS_F: 1.0
  CUT_CONTEXTLEN: True
  DAEL:
    CONF_THRE: 0.95
    STRONG_TRANSFORMS: ()
    WEIGHT_U: 0.5
  DAELDG:
    CONF_THRE: 0.95
    STRONG_TRANSFORMS: ()
    WEIGHT_U: 0.5
  DDAIG:
    ALPHA: 0.5
    CLAMP: False
    CLAMP_MAX: 1.0
    CLAMP_MIN: -1.0
    G_ARCH: 
    LMDA: 0.3
    WARMUP: 0
  DOMAINMIX:
    ALPHA: 1.0
    BETA: 1.0
    TYPE: crossdomain
  ENTMIN:
    LMDA: 0.001
  FIXMATCH:
    CONF_THRE: 0.95
    STRONG_TRANSFORMS: ()
    WEIGHT_U: 1.0
  M3SDA:
    LMDA: 0.5
    N_STEP_F: 4
  MCD:
    N_STEP_F: 4
  MEANTEACHER:
    EMA_ALPHA: 0.999
    RAMPUP: 5
    WEIGHT_U: 1.0
  MIXMATCH:
    MIXUP_BETA: 0.75
    RAMPUP: 20000
    TEMP: 2.0
    WEIGHT_U: 100.0
  MME:
    LMDA: 0.1
  MVLPT:
    COCOOP:
      CTX_INIT: 
      N_CTX: 0
      PREC: fp16
    COOP:
      CLASS_TOKEN_POSITION: middle
      CSC: False
      CTX_INIT: 
      N_CTX: 16
    PREC: fp16
    PROJECT_DIM: 128
    PROJECT_METHOD: transformer
    VPT:
      CSC: False
      CTX_INIT: 
      DEEP: True
      DROPOUT: 0.0
      N_CTX: 0
      PROJECT: -1
  NAME: MVLPT
  SE:
    CONF_THRE: 0.95
    EMA_ALPHA: 0.999
    RAMPUP: 300
USE_CUDA: True
VERBOSE: True
VERSION: 1
Collecting env info ...
** System info **
PyTorch version: 1.10.2+cu102
Is debug build: False
CUDA used to build PyTorch: 10.2
ROCM used to build PyTorch: N/A

OS: Ubuntu 20.04.5 LTS (x86_64)
GCC version: (Ubuntu 9.4.0-1ubuntu1~20.04.1) 9.4.0
Clang version: Could not collect
CMake version: Could not collect
Libc version: glibc-2.31

Python version: 3.8.13 (default, Oct 21 2022, 23:50:54)  [GCC 11.2.0] (64-bit runtime)
Python platform: Linux-5.15.0-58-generic-x86_64-with-glibc2.17
Is CUDA available: True
CUDA runtime version: 11.5.50
GPU models and configuration: 
GPU 0: NVIDIA GeForce GTX 1080
GPU 1: NVIDIA GeForce GTX 1080
GPU 2: NVIDIA GeForce GTX 1080
GPU 3: NVIDIA GeForce GTX 1080

Nvidia driver version: 470.161.03
cuDNN version: Could not collect
HIP runtime version: N/A
MIOpen runtime version: N/A

Versions of relevant libraries:
[pip3] numpy==1.19.2
[pip3] pytorch-lightning==1.4.0
[pip3] torch==1.10.2
[pip3] torchmetrics==0.6.0
[pip3] torchvision==0.11.3
[conda] blas                      1.0                         mkl  
[conda] cudatoolkit               11.3.1               h2bc3f7f_2  
[conda] ffmpeg                    4.3                  hf484d3e_0    pytorch
[conda] mkl                       2021.4.0           h06a4308_640  
[conda] mkl-service               2.4.0            py38h7f8727e_0  
[conda] mkl_fft                   1.3.1            py38hd3c417c_0  
[conda] mkl_random                1.2.2            py38h51133e4_0  
[conda] numpy                     1.19.2                   pypi_0    pypi
[conda] pytorch-lightning         1.4.0                    pypi_0    pypi
[conda] pytorch-mutex             1.0                        cuda    pytorch
[conda] torch                     1.10.2                   pypi_0    pypi
[conda] torchmetrics              0.6.0                    pypi_0    pypi
[conda] torchvision               0.11.3                   pypi_0    pypi
        Pillow (9.4.0)

Loading trainer: MVLPT
Traceback (most recent call last):
  File "train.py", line 309, in <module>
    main(args)
  File "train.py", line 222, in main
    trainer = build_trainer(cfg)
  File "/home/stly/Data/Soft/Anaconda3/envs/py38/lib/python3.8/site-packages/dassl/engine/build.py", line 11, in build_trainer
    return TRAINER_REGISTRY.get(cfg.TRAINER.NAME)(cfg)
  File "/home/stly/Data/Soft/Anaconda3/envs/py38/lib/python3.8/site-packages/dassl/engine/trainer.py", line 324, in __init__
    self.build_data_loader()
  File "/home/stly/Data/HULA/projects/prompting/MVLPT/trainers/mvlpt.py", line 893, in build_data_loader
    dm = MVLPTCOOPDataManager(self.cfg)
  File "/home/stly/Data/HULA/projects/prompting/MVLPT/trainers/mvlpt.py", line 603, in __init__
    dataset = build_dataset(cfg)
  File "/home/stly/Data/Soft/Anaconda3/envs/py38/lib/python3.8/site-packages/dassl/data/datasets/build.py", line 8, in build_dataset
    check_availability(cfg.DATASET.NAME, avai_datasets)
  File "/home/stly/Data/Soft/Anaconda3/envs/py38/lib/python3.8/site-packages/dassl/utils/tools.py", line 174, in check_availability
    raise ValueError(
ValueError: The requested one is expected to belong to ['Digit5', 'VisDA17', 'CIFARSTL', 'Office31', 'DomainNet', 'OfficeHome', 'miniDomainNet', 'PACS', 'VLCS', 'FMoW', 'IWildCam', 'Camelyon17', 'CIFAR10C', 'CIFAR100C', 'DigitsDG', 'DigitSingle', 'OfficeHomeDG', 'CIFAR10', 'CIFAR100', 'SVHN', 'STL10', 'OxfordPets', 'OxfordFlowers', 'FGVCAircraft', 'DescribableTextures', 'EuroSAT', 'StanfordCars', 'Food101', 'SUN397', 'Caltech101', 'UCF101', 'ImageNet', 'ImageNetSketch', 'ImageNetV2', 'ImageNetA', 'ImageNetR', 'ImageNet21k', 'Bamboo', 'Bloodmnist', 'Breastmnist', 'Chestmnist', 'Dermamnist', 'Octmnist', 'Organamnist', 'Organcmnist', 'Organsmnist', 'Pathmnist', 'Pneumoniamnist', 'Retinamnist', 'Tissuemnist'], but got [resisc45_clip] (do you mean [Breastmnist]?)

I add the MedMnist dataset into MVLPT/datasets and it is fine as shown in the traceback. So I guess that the error comes from missing the [*dataset].py in MVLPT/datasets. Could you share how to get them too?

Thank you

@tsly123 In order to run ELEVATOR dataset, you should use bash scripts/mvlpt/main_single_elevater_cut.sh CoOp vit_b16 16 1 1 resisc45_clip. Note you are running main_single_coopdata_cut.sh.

Hi @bronyayang ,

I used the bash scripts/mvlpt/main_single_elevater_cut.sh CoOp vit_b16 16 1 1 resisc45_clip and it gives me the error below.

It looks like the error is the dataset directory path. When I first ran the bash scripts/mvlpt/main_single_elevater_cut.sh. It downloaded the dataset.

Do I need to do something else? How should I organize the elevater dataset?

Thank you.

[nltk_data] Downloading package punkt to /home/stly/nltk_data...
[nltk_data]   Package punkt is already up-to-date!
[nltk_data] Downloading package wordnet to /home/stly/nltk_data...
[nltk_data]   Package wordnet is already up-to-date!
/project/hnguyen2/stly/code/prompting/MVLPT/clip/clip.py:23: UserWarning: PyTorch version 1.7.1 or higher is recommended
  warnings.warn("PyTorch version 1.7.1 or higher is recommended")
Setting fixed seed: 1
***************
** Arguments **
***************
act_ckpt: 1
backbone: 
config_file: configs/trainers/MVLPT/vit_b16.yaml
cut_contextlen: False
dataset: 1
dataset_config_file: 
dataset_coop: False
eval_only: False
head: 
load_epoch: None
model_dir: 
multi_task: False
multi_task_evalkey: average
multi_task_label_pertask: False
no_train: False
opts: ['TRAINER.MVLPT.VPT.N_CTX', '0', 'TRAINER.MVLPT.COOP.N_CTX', '16', 'TRAINER.MVLPT.COOP.CLASS_TOKEN_POSITION', 'middle', 'TRAINER.MVLPT.COOP.CSC', 'False', 'TEST.NO_TEST', 'False', 'TEST.FINAL_MODEL', 'best_val', 'TRAINER.CUT_CONTEXTLEN', 'True']
output_dir: resisc45_cli/1/CoOp/vit_b16_1shots/nctx16_csc_ctp/seed1
resume: 
root: /project/hnguyen2/stly/code/datasets/prompting/data
seed: 1
shots: 1
source_domains: None
target_domains: None
trainer: MVLPT
transforms: None
************
** Config **
************
DATALOADER:
  K_TRANSFORMS: 1
  NUM_WORKERS: 8
  RETURN_IMG0: False
  TEST:
    BATCH_SIZE: 1024
    SAMPLER: SequentialSampler
  TRAIN_U:
    BATCH_SIZE: 32
    N_DOMAIN: 0
    N_INS: 16
    SAME_AS_X: True
    SAMPLER: RandomSampler
  TRAIN_X:
    BATCH_SIZE: 32
    N_DOMAIN: 0
    N_INS: 16
    SAMPLER: RandomSampler
DATASET:
  ALL_AS_UNLABELED: False
  CENTER_CROP: False
  CIFAR_C_LEVEL: 1
  CIFAR_C_TYPE: 
  COOP: False
  DATASET: 1
  MULTITASK: False
  MULTITASK_EVALKEY: average
  MULTITASK_LABEL_PERTASK: False
  NAME: 
  NUM_LABELED: -1
  NUM_SAMPLES_PER_CLASS: 1
  NUM_SHOTS: 1
  RANDOM_SEED_SAMPLING: 1
  ROOT: /project/hnguyen2/stly/code/datasets/prompting/data
  SOURCE_DOMAINS: ()
  STL10_FOLD: -1
  SUBSAMPLE_CLASSES: all
  TARGET_DOMAINS: ()
  TEST_SET: val
  TRAIN_SET: train
  VAL_PERCENT: 0.1
  VAL_SET: 
INPUT:
  COLORJITTER_B: 0.4
  COLORJITTER_C: 0.4
  COLORJITTER_H: 0.1
  COLORJITTER_S: 0.4
  CROP_PADDING: 4
  CUTOUT_LEN: 16
  CUTOUT_N: 1
  GB_K: 21
  GB_P: 0.5
  GN_MEAN: 0.0
  GN_STD: 0.15
  INTERPOLATION: bicubic
  NO_TRANSFORM: False
  PIXEL_MEAN: [0.48145466, 0.4578275, 0.40821073]
  PIXEL_STD: [0.26862954, 0.26130258, 0.27577711]
  RANDAUGMENT_M: 10
  RANDAUGMENT_N: 2
  RGS_P: 0.2
  RRCROP_SCALE: (0.08, 1.0)
  SIZE: (224, 224)
  TRANSFORMS: ('random_resized_crop', 'random_flip', 'normalize')
MODEL:
  BACKBONE:
    NAME: ViT-B/16
    PRETRAINED: True
  HEAD:
    ACTIVATION: relu
    BN: True
    DROPOUT: 0.0
    HIDDEN_LAYERS: ()
    NAME: 
  INIT_WEIGHTS: 
OPTIM:
  ADAM_BETA1: 0.9
  ADAM_BETA2: 0.999
  BASE_LR_MULT: 0.1
  GAMMA: 0.1
  LR: 0.002
  LR_SCHEDULER: cosine
  MAX_EPOCH: 200
  MOMENTUM: 0.9
  NAME: sgd
  NEW_LAYERS: ()
  RMSPROP_ALPHA: 0.99
  SGD_DAMPNING: 0
  SGD_NESTEROV: False
  STAGED_LR: False
  STEPSIZE: (-1,)
  WARMUP_CONS_LR: 1e-05
  WARMUP_EPOCH: 1
  WARMUP_MIN_LR: 1e-05
  WARMUP_RECOUNT: True
  WARMUP_TYPE: constant
  WEIGHT_DECAY: 0.0005
OUTPUT_DIR: resisc45_cli/1/CoOp/vit_b16_1shots/nctx16_csc_ctp/seed1
RESUME: 
SEED: 1
TEST:
  COMPUTE_CMAT: False
  EVALUATOR: Classification
  FINAL_MODEL: best_val
  NO_TEST: False
  PER_CLASS_RESULT: False
  SPLIT: test
TRAIN:
  CHECKPOINT_FREQ: 0
  COUNT_ITER: train_x
  PRINT_FREQ: 5
TRAINER:
  ACT_CKPT: 1
  CDAC:
    CLASS_LR_MULTI: 10
    P_THRESH: 0.95
    RAMPUP_COEF: 30
    RAMPUP_ITRS: 1000
    STRONG_TRANSFORMS: ()
    TOPK_MATCH: 5
  COCOOP:
    CTX_INIT: 
    N_CTX: 16
    PREC: fp16
  COOP:
    CLASS_TOKEN_POSITION: end
    CSC: False
    CTX_INIT: 
    N_CTX: 16
    PREC: fp16
  CROSSGRAD:
    ALPHA_D: 0.5
    ALPHA_F: 0.5
    EPS_D: 1.0
    EPS_F: 1.0
  CUT_CONTEXTLEN: True
  DAEL:
    CONF_THRE: 0.95
    STRONG_TRANSFORMS: ()
    WEIGHT_U: 0.5
  DAELDG:
    CONF_THRE: 0.95
    STRONG_TRANSFORMS: ()
    WEIGHT_U: 0.5
  DDAIG:
    ALPHA: 0.5
    CLAMP: False
    CLAMP_MAX: 1.0
    CLAMP_MIN: -1.0
    G_ARCH: 
    LMDA: 0.3
    WARMUP: 0
  DOMAINMIX:
    ALPHA: 1.0
    BETA: 1.0
    TYPE: crossdomain
  ENTMIN:
    LMDA: 0.001
  FIXMATCH:
    CONF_THRE: 0.95
    STRONG_TRANSFORMS: ()
    WEIGHT_U: 1.0
  M3SDA:
    LMDA: 0.5
    N_STEP_F: 4
  MCD:
    N_STEP_F: 4
  MEANTEACHER:
    EMA_ALPHA: 0.999
    RAMPUP: 5
    WEIGHT_U: 1.0
  MIXMATCH:
    MIXUP_BETA: 0.75
    RAMPUP: 20000
    TEMP: 2.0
    WEIGHT_U: 100.0
  MME:
    LMDA: 0.1
  MVLPT:
    COCOOP:
      CTX_INIT: 
      N_CTX: 0
      PREC: fp16
    COOP:
      CLASS_TOKEN_POSITION: middle
      CSC: False
      CTX_INIT: 
      N_CTX: 16
    PREC: fp16
    PROJECT_DIM: 128
    PROJECT_METHOD: transformer
    VPT:
      CSC: False
      CTX_INIT: 
      DEEP: True
      DROPOUT: 0.0
      N_CTX: 0
      PROJECT: -1
  NAME: MVLPT
  SE:
    CONF_THRE: 0.95
    EMA_ALPHA: 0.999
    RAMPUP: 300
USE_CUDA: True
VERBOSE: True
VERSION: 1
Collecting env info ...
** System info **
PyTorch version: 1.10.2+cu102
Is debug build: False
CUDA used to build PyTorch: 10.2
ROCM used to build PyTorch: N/A

OS: Red Hat Enterprise Linux 8.7 (Ootpa) (x86_64)
GCC version: (GCC) 8.5.0 20210514 (Red Hat 8.5.0-15)
Clang version: Could not collect
CMake version: Could not collect
Libc version: glibc-2.28

Python version: 3.8.16 (default, Jan 17 2023, 23:13:24)  [GCC 11.2.0] (64-bit runtime)
Python platform: Linux-4.18.0-425.3.1.el8.x86_64-x86_64-with-glibc2.17
Is CUDA available: False
CUDA runtime version: 11.6.124
GPU models and configuration: Could not collect
Nvidia driver version: Could not collect
cuDNN version: Could not collect
HIP runtime version: N/A
MIOpen runtime version: N/A

Versions of relevant libraries:
[pip3] numpy==1.19.2
[pip3] pytorch-lightning==1.4.0
[pip3] torch==1.10.2
[pip3] torchmetrics==0.6.0
[pip3] torchvision==0.11.3
[conda] blas                      1.0                         mkl  
[conda] ffmpeg                    4.3                  hf484d3e_0    pytorch
[conda] mkl                       2021.4.0           h06a4308_640  
[conda] mkl-service               2.4.0            py38h7f8727e_0  
[conda] mkl_fft                   1.3.1            py38hd3c417c_0  
[conda] mkl_random                1.2.2            py38h51133e4_0  
[conda] numpy                     1.19.2                   pypi_0    pypi
[conda] pytorch-cuda              11.6                 h867d48c_1    pytorch
[conda] pytorch-lightning         1.4.0                    pypi_0    pypi
[conda] pytorch-mutex             1.0                        cuda    pytorch
[conda] torch                     1.10.2                   pypi_0    pypi
[conda] torchmetrics              0.6.0                    pypi_0    pypi
[conda] torchvision               0.11.3                   pypi_0    pypi
        Pillow (8.3.1)

Loading trainer: MVLPT
/home/stly/anaconda3/envs/prompt/lib/python3.8/site-packages/torchvision/transforms/transforms.py:287: UserWarning: Argument interpolation should be of type InterpolationMode instead of int. Please, use InterpolationMode enum.
  warnings.warn(
Traceback (most recent call last):
  File "train.py", line 310, in <module>
    main(args)
  File "train.py", line 223, in main
    trainer = build_trainer(cfg)
  File "/home/stly/anaconda3/envs/prompt/lib/python3.8/site-packages/dassl/engine/build.py", line 11, in build_trainer
    return TRAINER_REGISTRY.get(cfg.TRAINER.NAME)(cfg)
  File "/home/stly/anaconda3/envs/prompt/lib/python3.8/site-packages/dassl/engine/trainer.py", line 324, in __init__
    self.build_data_loader()
  File "/project/hnguyen2/stly/code/prompting/MVLPT/trainers/mvlpt.py", line 897, in build_data_loader
    dm = MVLPTDataManager(self.cfg)
  File "/project/hnguyen2/stly/code/prompting/MVLPT/trainers/mvlpt.py", line 744, in __init__
    train_loader_x, val_loader, test_loader, class_map, train_dataset = construct_dataloader(cfg)
  File "/project/hnguyen2/stly/code/prompting/MVLPT/trainers/vision_benchmark/evaluation/feature.py", line 615, in construct_dataloader
    train_dataloader, val_dataloader = get_dataloader(torchvision.datasets.ImageFolder(os.path.join(config.DATASET.ROOT, config.DATASET.TRAIN_SET), transform=transform_clip),
  File "/home/stly/anaconda3/envs/prompt/lib/python3.8/site-packages/torchvision/datasets/folder.py", line 310, in __init__
    super(ImageFolder, self).__init__(root, loader, IMG_EXTENSIONS if is_valid_file is None else None,
  File "/home/stly/anaconda3/envs/prompt/lib/python3.8/site-packages/torchvision/datasets/folder.py", line 145, in __init__
    classes, class_to_idx = self.find_classes(self.root)
  File "/home/stly/anaconda3/envs/prompt/lib/python3.8/site-packages/torchvision/datasets/folder.py", line 221, in find_classes
    return find_classes(directory)
  File "/home/stly/anaconda3/envs/prompt/lib/python3.8/site-packages/torchvision/datasets/folder.py", line 40, in find_classes
    classes = sorted(entry.name for entry in os.scandir(directory) if entry.is_dir())
FileNotFoundError: [Errno 2] No such file or directory: '/project/hnguyen2/stly/code/datasets/prompting/data/train'

@tsly123 Your current error is because I changed the repo during rebuttal. It is fixed now. Can you pull the current version again and try?

No need to worry about dataset. It should be downloaded automatically.

Hi @bronyayang ,

Thanks for your reply. It works with ELEVATER datasets. However, when I ran with custom dataset (medmnist), it give me the error below. Note that, I ran finetune with main_single_coopdata_cut.sh UPT vit_b16 4 5 1 medmnist, it works. I prepare the medmnist dataset similar https://github.com/sIncerass/MVLPT/blob/main/datasets/imagenet.py.

Traceback` (most recent call last):
  File "train.py", line 310, in <module>
    main(args)
  File "train.py", line 223, in main
    trainer = build_trainer(cfg)
  File "/home/stly/anaconda3/envs/prompt/lib/python3.8/site-packages/dassl/engine/build.py", line 11, in build_trainer
    return TRAINER_REGISTRY.get(cfg.TRAINER.NAME)(cfg)
  File "/home/stly/anaconda3/envs/prompt/lib/python3.8/site-packages/dassl/engine/trainer.py", line 324, in __init__
    self.build_data_loader()
  File "/project/hnguyen2/stly/code/prompting/MVLPT/trainers/mvlpt.py", line 897, in build_data_loader
    dm = MVLPTDataManager(self.cfg)
  File "/project/hnguyen2/stly/code/prompting/MVLPT/trainers/mvlpt.py", line 744, in __init__
    train_loader_x, val_loader, test_loader, class_map, train_dataset = construct_dataloader(cfg)
  File "/project/hnguyen2/stly/code/prompting/MVLPT/trainers/vision_benchmark/evaluation/feature.py", line 615, in construct_dataloader
    train_dataloader, val_dataloader = get_dataloader(torchvision.datasets.ImageFolder(os.path.join(config.DATASET.ROOT, config.DATASET.TRAIN_SET), transform=transform_clip),
  File "/home/stly/anaconda3/envs/prompt/lib/python3.8/site-packages/torchvision/datasets/folder.py", line 310, in __init__
    super(ImageFolder, self).__init__(root, loader, IMG_EXTENSIONS if is_valid_file is None else None,
  File "/home/stly/anaconda3/envs/prompt/lib/python3.8/site-packages/torchvision/datasets/folder.py", line 145, in __init__
    classes, class_to_idx = self.find_classes(self.root)
  File "/home/stly/anaconda3/envs/prompt/lib/python3.8/site-packages/torchvision/datasets/folder.py", line 221, in find_classes
    return find_classes(directory)
  File "/home/stly/anaconda3/envs/prompt/lib/python3.8/site-packages/torchvision/datasets/folder.py", line 40, in find_classes
    classes = sorted(entry.name for entry in os.scandir(directory) if entry.is_dir())
FileNotFoundError: [Errno 2] No such file or directory: '/project/hnguyen2/stly/code/datasets/prompting/data/train'

Hi @bronyayang ,

I manage to run with the custom dataset. For future usage, I list some steps here. First, you must prepare your data using this https://github.com/microsoft/vision-datasets (as those in ELEVATER) and also modify some files accordingly.

Adding your dataset info: https://github.com/sIncerass/MVLPT/blob/main/trainers/vision_benchmark/resources/datasets/vision_datasets.json
Metrics, classes, and templates: https://github.com/sIncerass/MVLPT/blob/main/trainers/vision_benchmark/datasets/prompts.py

However, I get different results from finetuning with scripts/mvlpt/main_mt_coopdata_cut.sh and scripts/mvlpt/main_single_elevater_cut.sh with the same pre-trained (ImageNet,...), trainer, shots, and seed. I thought the 2 scripts are exact the same, except they run with different data pipelines (different data structure CoOp and ELEVATER) https://github.com/sIncerass/MVLPT/blob/main/trainers/mvlpt.py#L892

Do you happen to know any reason for this difference? or suggestion?
Can I use main_mt_coopdata_cut.sh instead of main_single_elevater_cut.sh for datasets that are not in both CoOp and ELEVATER, i.e., custom dataset?

Thank you.

sIncerass / MVLPT

About the detailed setting of Table 1 #1