megvii-research / SSQL-ECCV2022

PyTorch implementation of SSQL (Accepted to ECCV2022 oral presentation)
Apache License 2.0
75 stars 6 forks source link

full precision linear evaluation config #3

Closed Huiimin5 closed 1 year ago

Huiimin5 commented 1 year ago

Hi,

Thank you so much for your contribution. Could you please also release full precision linear evaluation config?

Best

CupidJay commented 1 year ago

Thanks for your interest. You can set QUANT.W.BIT and QUANT.A.BIT to 0 for full precision linear evaluation.

QUANT:
    TYPE: ptq
    W:
        BIT: 0
        SYMMETRY: True
        QUANTIZER: uniform
        GRANULARITY : channelwise
        OBSERVER_METHOD:
            NAME: MINMAX
    A:
        BIT: 0
        SYMMETRY: False
        QUANTIZER: uniform
        GRANULARITY : layerwise
        OBSERVER_METHOD:
            NAME: MINMAX
Huiimin5 commented 1 year ago

Thank you so much for your quick response. I cannot reproduce simsiam results when pretraining using your config resnet50_simsiam_imagenet.yaml, and linear probing using main_lincls.py from https://github.com/facebookresearch/simsiam/blob/main/main_lincls.py. Is that expected?

CupidJay commented 1 year ago

I think the reason may be that the model's weights are not loaded correctly. In our code, the prefix is 'module.encoder_q.' while in the original repository the prefix is 'module.encoder.'. Please check it.

Huiimin5 commented 1 year ago

Yes previously I have made sure that the removed prefix is "module.encoder_q". The acc is very low: * Acc@1 2.454 Acc@5 7.596.
It seems the problem is with the pretraining.

CupidJay commented 1 year ago

This is indeed strange, can you provide the training log?

Huiimin5 commented 1 year ago

Please check this: Training_log Thank you so much for your help. The config file I use is https://github.com/megvii-research/SSQL-ECCV2022/blob/main/configs/imagenet/resnet50_simsiam_imagenet.yaml. I only changed epoch 200 -> 100.

CupidJay commented 1 year ago

I'm trying to re-run the results and will let you know if there is an update.

Huiimin5 commented 1 year ago

Thank you so much for your help.

May I have the training log of ssql+simsiam with ResNet 50 on ImageNet, to make sure my training of your model is correct? The training is a little bit slow so I test the 50-ep checkpoint and the linear evaluation result after 1st epoch is: Acc@1 5.47 ( 3.22) Acc@5 15.23 ( 11.52). This result is much lower than simsiam (Acc@1 29.30 ( 19.37) Acc@5 57.62 ( 39.33)). I can realize it is not fair to compare 50-ep with 100-ep, but the difference is so huge that I am worried if I can correctly reproduce your results.

CupidJay commented 1 year ago

Since I am no longer an intern at Megvii, I don't have a training log here. However, this number does not seem right. After I finish running the SimSiam baseline, I will check to see if there is a problem with the uploaded code.

CupidJay commented 1 year ago

Hi, I found a "bug" in my code.

For the sake of convenience, I directly judge whether there is a letter `Q' in the path when distinguishing the SSQL model from other models, as shown here.

But when I uploaded this version of the code, I accidentally added the letter 'Q' (train_log_SSQL_release/xxx) to the root path, which caused loading problems during linear evaluation. I am very sorry for the problems caused by my irregular code. For convenience, I am doing experiments on CIFAR (seems right now), and I will immediately update the results on CIFAR (including training logs) when the results come out. At that time, you can first reproduce the results on CIFAR to confirm the correctness, and I will update the results of ImageNet later.

In short, for example, when saving SimSiam models, you can specify the path as train_log/r50/simsiam (make sure not to contain the letter 'Q'), and when storing SSQL, specify the path as train_log/r50/SSQL

I've now modified the scripts to make sure the paths don't create such issues.

CupidJay commented 1 year ago

CIFAR-10 Exp

I re-run the experiments on CIFAR, and it seems that the results are fine. (I can upload the checkpoints if needed.)

Method FP 4w4a 2w4a
SimSiam (train_log) 91.1 89.2 63.7
SSQL (train_log) 91.0 90.4 87.7

ImageNet Exp

To be updated. Since there is a deadline for ICCV recently, I will update it as soon as I am free.

Method FP
SimSiam (train_log) 67.9
Huiimin5 commented 1 year ago

Thank you for your update. I also noticed this issue and by properly setting it I can get reasonable results after complete linear evaluation.

Huiimin5 commented 1 year ago

Could you please specify why a calibration step is necessary in linear evaluation? Thank you in advance.

CupidJay commented 1 year ago

Could you please specify why a calibration step is necessary in linear evaluation? Thank you in advance.

We do not need a calibration step when we do full precision linear evaluation. But if we conduct quantized version linear evaluation, we need to calculate the scale and zero point under a specific bit config.

Huiimin5 commented 1 year ago

Thank you a lot for your clarification!

I notice there is a difference in the parameter setting of AUG. TRAIN. NORMLIZATION between SSQL model with non-SSQL models. For example, between simsiam and ssql_simsiam. Could you please specify how they are set? In your experience, are pre-training results sensitive to these settings? Many thanks!

CupidJay commented 1 year ago

Thank you a lot for your clarification!

I notice there is a difference in the parameter setting of AUG. TRAIN. NORMLIZATION between SSQL model with non-SSQL models. For example, between simsiam and ssql_simsiam. Could you please specify how they are set? In your experience, are pre-training results sensitive to these settings? Many thanks!

Actually, there is no difference in the augmentations. The setting of augmentation in config does not work, I set the transformation here and you can see that all the SSL methods use the same augmentation. I really should remove these useless configurations in the configs.

CupidJay commented 1 year ago

Thank you a lot for your clarification! I notice there is a difference in the parameter setting of AUG. TRAIN. NORMLIZATION between SSQL model with non-SSQL models. For example, between simsiam and ssql_simsiam. Could you please specify how they are set? In your experience, are pre-training results sensitive to these settings? Many thanks!

Actually, there is no difference in the augmentations. The setting of augmentation in config does not work, I set the transformation here and you can see that all the SSL methods use the same augmentation. I really should remove these useless configurations in the configs.

I notice that it is an error in the normalization in the config file. Thanks for the reminder, I have updated the config file.

Huiimin5 commented 1 year ago

OK. Thank you so much for your clarification.

Huiimin5 commented 1 year ago

In linear.py, all pre-trained models are automatically transferred to QuantModel objects and passed through a series of quantization-related operations. Could you teach me how to check if these operations actually do nothing to full-precision models? Thank you so much!

CupidJay commented 1 year ago

In linear.py, all pre-trained models are automatically transferred to QuantModel objects and passed through a series of quantization-related operations. Could you teach me how to check if these operations actually do nothing to full-precision models? Thank you so much!

You can check it here. For full precision, we set all bits to 0. Therefore, in the quantizer, the quantization step will not be performed, and the original input x will be returned directly. If you are not at ease, you can add some print statements here to check. If you are still not at ease, you can remove these quant-tools when doing full precision linear evaluation.

CupidJay commented 1 year ago

Thank you so much for your help.

May I have the training log of ssql+simsiam with ResNet 50 on ImageNet, to make sure my training of your model is correct? The training is a little bit slow so I test the 50-ep checkpoint and the linear evaluation result after 1st epoch is: Acc@1 5.47 ( 3.22) Acc@5 15.23 ( 11.52). This result is much lower than simsiam (Acc@1 29.30 ( 19.37) Acc@5 57.62 ( 39.33)). I can realize it is not fair to compare 50-ep with 100-ep, but the difference is so huge that I am worried if I can correctly reproduce your results.

Hi, sorry for the late reply. I have found the problem and fixed the bug. In the previous code, the hidden dimension of the fc in encoder_q was accidentally set to 512, while the original implementation is 2048. Collapse occurs when the dimension is 512, and normal training occurs when the dimension is 2048. This is a strange phenomenon that exists in SimSiam and it seems that setting all dimensions to 2048 in the config file is a safe operation. Now the latest code has been updated and my local training has got the correct result. Please refer to these two lines of changes in SimSiam and QSimSiam.