Dear authors:
I am trying to train a vqkd tokenizer with parameters 8192*1024.
I use imagenet1k_train0, 25w images.
The accuracy can decrease, but the unused code always be 7342.
The following are my log, please help me to check it out! Thank you so much!
Dear authors: I am trying to train a vqkd tokenizer with parameters 8192*1024. I use imagenet1k_train0, 25w images. The accuracy can decrease, but the unused code always be 7342. The following are my log, please help me to check it out! Thank you so much!
log:
{"train_lr": 1.3788770659541397e-05, "train_min_lr": 1.3788770659541397e-05, "train_loss": 0.19607612831518054, "train_quant_loss": 0.0007862878009182168, "train_rec_loss": 0.19528984052687884, "train_total_loss": 0.19607612831518054, "train_weight_decay": 0.00010000000000000078, "train_grad_norm": 0.0510355946496129, "train_Unused_code": 7342, "test_loss": 0.19637961502215176, "test_quant_loss": 0.0007839596094554784, "test_rec_loss": 0.19559565511052357, "test_total_loss": 0.19637961502215176, "test_unused_code": 7342, "epoch": 55, "n_parameters": 110353920} {"train_lr": 1.230445940166662e-05, "train_min_lr": 1.230445940166662e-05, "train_loss": 0.19597856950387357, "train_quant_loss": 0.0007866458412463544, "train_rec_loss": 0.19519192356616258, "train_total_loss": 0.19597856950387357, "train_weight_decay": 0.00010000000000000078, "train_grad_norm": Infinity, "train_Unused_code": 7342, "test_loss": 0.19625377976758915, "test_quant_loss": 0.0007833929184099278, "test_rec_loss": 0.19547038700318697, "test_total_loss": 0.19625377976758915, "test_unused_code": 7342, "epoch": 56, "n_parameters": 110353920} {"train_lr": 1.1185975093969197e-05, "train_min_lr": 1.1185975093969197e-05, "train_loss": 0.19587323313951494, "train_quant_loss": 0.0007866945194255095, "train_rec_loss": 0.19508653854951263, "train_total_loss": 0.19587323313951494, "train_weight_decay": 0.00010000000000000078, "train_grad_norm": 0.05127894716709852, "train_Unused_code": 7342, "test_loss": 0.1961117209302205, "test_quant_loss": 0.0007821548750860419, "test_rec_loss": 0.19532956575241053, "test_total_loss": 0.1961117209302205, "test_unused_code": 7342, "epoch": 57, "n_parameters": 110353920} {"train_lr": 1.0437731883024781e-05, "train_min_lr": 1.0437731883024781e-05, "train_loss": 0.19577259112522005, "train_quant_loss": 0.000786652019203757, "train_rec_loss": 0.1949859390705824, "train_total_loss": 0.19577259112522005, "train_weight_decay": 0.00010000000000000078, "train_grad_norm": 0.05146480340510607, "train_Unused_code": 7342, "test_loss": 0.19604056173314652, "test_quant_loss": 0.0007831061271700133, "test_rec_loss": 0.19525745556210028, "test_total_loss": 0.19604056173314652, "test_unused_code": 7342, "epoch": 58, "n_parameters": 110353920} {"train_lr": 1.0062682742947154e-05, "train_min_lr": 1.0062682742947154e-05, "train_loss": 0.19554498087614774, "train_quant_loss": 0.0007869089091254864, "train_rec_loss": 0.1947580719962716, "train_total_loss": 0.19554498087614774, "train_weight_decay": 0.00010000000000000078, "train_grad_norm": 0.051605010330677034, "train_Unused_code": 7342, "test_loss": 0.1959357580847361, "test_quant_loss": 0.0007835275031208188, "test_rec_loss": 0.1951522303350044, "test_total_loss": 0.1959357580847361, "test_unused_code": 7342, "epoch": 59, "n_parameters": 110353920}