Best thanks to paper's authors for great contribution and also to Dassl and CoOP, this repo is produced based on above gorgeous codes. Thus, you may have to install Dassl before going further.
In this scenario, we need to use CALIP_FS, which means model has to be fine-tuned due to introduction of linear layers in attention.
All you need is CALIP/scripts/fewshots.sh
, which contains two input arguments.
DATASET
takes as input a dataset name, like imagenet
or caltech101
. The valid names are the files' names in CALIP/configs/datasets/
.
We fix CFG
to be rn50_ep200_ctxv1.yaml
in order to follow original setup in paper for training CALIP_FS.
Below we provide examples on how to run CALIP_FS on OxfordsPets.
bash scripts/fewshots.sh oxford_pets 1
bash scripts/fewshots.sh oxford_pets 2
bash scripts/fewshots.sh oxford_pets 4
bash scripts/fewshots.sh oxford_pets 8
bash scripts/fewshots.sh oxford_pets 16
See CALIP/scripts/zeroshot.sh
for using CALIP_PF(CALIP with parameter-free attention).
bash scripts/zeroshot.sh oxford_pets
One can also modify settings in main.py
extend_cfg
function, where you can specify optimal hyper-parameters denoted in the paper. And according to original paper, both max and avg pooling are utilized for obtaining $F_v^{a}$, but there is ambiguity that how to combine them, in implementation, we only add them together, results may be influcenced by this operation.