There seems to be at least two mismatch between the paper and repo hyperparameters.
1) SPR_weight
In paper: "We set λ SPR = 2 and λ IM = 1 during pre-training. Unless otherwise noted, all settings match SPR during fine-tuning, including batch size, replay ratio, target network update period, and λ SPR" (in the SPR paper λ SPR = 2 as well).
However, in /sgim_pretrain.sh SPR weight is set to 1.
On the other hand, the SPR weight is not set in /sgiml_finetune.sh, which means it uses the default in config, which is 5.
2) momentum_Tau during finetuning
In paper, it says finetuning hyperparameters are same as ones used in SPR.
In SPR, no EMA is used (tau = 0) when augmentation is used.
However, in the code tau is not set in /sigml_finetune.sh, which defaults to 0.01 in config.yaml
Am I correct to assume that the repo versions are incorrect?
These 2 are the only ones I can find, but I worry that I may have missed some. I would really appreciate if any of you can help take a second look, so that others like myself can reliably reproduce your result!
Another difference is that clip_grad_norm and clip_model_grad_norm are set to -1 in sgim_finetune.sh, whereas in the SPR paper you clipped the gradients to 10
Hi,
There seems to be at least two mismatch between the paper and repo hyperparameters.
1) SPR_weight In paper: "We set λ SPR = 2 and λ IM = 1 during pre-training. Unless otherwise noted, all settings match SPR during fine-tuning, including batch size, replay ratio, target network update period, and λ SPR" (in the SPR paper λ SPR = 2 as well).
However, in /sgim_pretrain.sh SPR weight is set to 1. On the other hand, the SPR weight is not set in /sgiml_finetune.sh, which means it uses the default in config, which is 5.
2) momentum_Tau during finetuning In paper, it says finetuning hyperparameters are same as ones used in SPR. In SPR, no EMA is used (tau = 0) when augmentation is used. However, in the code tau is not set in /sigml_finetune.sh, which defaults to 0.01 in config.yaml
Am I correct to assume that the repo versions are incorrect?
These 2 are the only ones I can find, but I worry that I may have missed some. I would really appreciate if any of you can help take a second look, so that others like myself can reliably reproduce your result!
Thanks, Kevin