Model training - Githubissues

DCNSW commented 2 years ago

Thanks for your release of the code.

When I quantize resnet18 to 4bit with Qimera and AIT and then evaluate the model on imagenet, an abnormal intermediate result is obtained:

2022-09-13 15:14:24,225 INFO: #==>Best Result of ep 60 is: Top1 Accuracy: 0.100000, Top5 Accuracy: 0.500000 at ep 60 2022-09-13 15:20:09,837 INFO: #==>[Epoch 61/400] [acc: 0.093750] [train loss: nan] 2022-09-13 15:20:09,838 INFO: #==>Best Result of ep 61 is: Top1 Accuracy: 0.100000, Top5 Accuracy: 0.500000 at ep 61 2022-09-13 15:25:58,151 INFO: #==>[Epoch 62/400] [acc: 0.125000] [train loss: nan] 2022-09-13 15:25:58,153 INFO: #==>Best Result of ep 62 is: Top1 Accuracy: 0.100000, Top5 Accuracy: 0.500000 at ep 62 2022-09-13 15:31:30,271 INFO: #==>[Epoch 63/400] [acc: 0.062500] [train loss: nan] 2022-09-13 15:31:30,273 INFO: #==>Best Result of ep 63 is: Top1 Accuracy: 0.100000, Top5 Accuracy: 0.500000 at ep 63 2022-09-13 15:37:26,569 INFO: #==>[Epoch 64/400] [acc: 0.031250] [train loss: nan]

It seems that the model training does not converge.

iamkanghyunchoi commented 2 years ago

Hi, First of all, thank you for your interest in our paper. Could you share the experiment settings that you used for the training?

DCNSW commented 2 years ago

Thanks for your reply.

I just use the default setting of imagenet_resnet18.hocon, and modify dataPath. For others, I keep still.

By the way, Qimera works well.

DCNSW commented 2 years ago

Maybe setting warmup_epochs = 0 in imagenet_resnet18.hocon is a typo. warmup_epochs = 50 solves this issue.

iamkanghyunchoi commented 2 years ago

Yes, that is a typo, and I just fixed it. I appreciate your effort.

iamkanghyunchoi / ait

Model training #4