Closed Fusionplay closed 1 year ago
By setting fc_lr_mul=0, I attained similar results that are very close to the listed results in your paper. However I still wonder if there are other unmentioned hyperparam settings that contribute to your results. It would be highly appreciated if you make them available.
In all our experiments, embedding layer has the same lr as rest of the model, that is why this hyper-param is not mentioned in the paper. So running fc_lr_mul=0 is correct. I updated its default value in the code so that this issues doesn't happen again.
Btw, the training commands in the readme file passes its value as 0.
Hi, in your paper, on cars196 dataset with ResNet-50, dim=512, no mixup, the results are 80.7 88.3 92.8 95.7 for k=1,2,4,8 respectively. However I can't reproduce these results with the stated optimum hyperparam setup in your paper and in the default hard-coded setting in your released code by now. I only attained approximately Recall [@1: 0.7166, @2: 0.8061, @4: 0.8707, @8: 0.9201, @16: 0.9546], which is much lower than your stated SOTA result.
Could your please release the hyperparam setup (or training procedures, or anything that affects the final results) of each dataset that leads to the results in your paper? It would be highly appreciated.
My hyperparam setting on cars196 that attained the aforementioned result:
dataset: cars196
lr: 0.0001
fc_lr_mul: 5.0
n_epochs: 170
kernels: 16
bs: 392
bs_base: 196
samples_per_class: 4
seed: 1
scheduler: step
gamma: 0.3
decay: 0.0004
tau: [80, 140]
infrequent_eval: 0
opt: adam
loss: recallatk
mixup: 0
sigmoid_temperature: 0.01
k_vals: [1, 2, 4, 8, 16]
k_vals_train: [1, 2, 4, 8, 16]
k_temperatures: [1.0, 1.0, 1.0, 1.0, 1.0]
resume
embed_dim: 512
arch: resnet50
grad_measure: False
dist_measure: False
not_pretrained: False
device: cuda
num_classes: 98