xiaoyao3302 / PoinTramba

PoinTramba: A Hybrid Transformer-Mamba Framework for Point Cloud Analysis
https://arxiv.org/abs/2405.15463
Apache License 2.0
46 stars 3 forks source link

Lower Reproduction Result for ScanObjectNN PB-T50-RS Compared to the Paper #1

Closed LiJin73 closed 4 months ago

LiJin73 commented 4 months ago

Thank you for your great work! I have followed all the instructions in the README to reproduce the code but achieved only an 84.2% accuracy on ScanObjectNN PB-T50-RS (the hardest):

98%|████████████████████████████████████▏| 294/301 [22:59:31<33:44, 289.15s/it]2024-06-15 23:12:37,635 - finetune_scan_hardest_lr3_group_size16 - INFO - [Training] EPOCH: 294 EpochTime = 262.990 (s) Losses = ['0.7716', '99.9737'] lr = 0.000010 2024-06-15 23:13:02,389 - finetune_scan_hardest_lr3_group_size16 - INFO - [Validation] EPOCH: 294 acc = 84.2123 2024-06-15 23:13:03,371 - finetune_scan_hardest_lr3_group_size16 - INFO - Save checkpoint at ./experiments/finetune_scan_hardest_lr3_group_size16/cfgs/new_run_Attention_simple_score_predictor_no_cls_token_4layer_drop_neg_both_sort_w_1_1_1_1006/ckpt-best.pth 2024-06-15 23:13:03,371 - finetune_scan_hardest_lr3_group_size16 - INFO - -------------------------------------------------------------------------------------------- 2024-06-15 23:13:04,158 - finetune_scan_hardest_lr3_group_size16 - INFO - Save checkpoint at ./experiments/finetune_scan_hardest_lr3_group_size16/cfgs/new_run_Attention_simple_score_predictor_no_cls_token_4layer_drop_neg_both_sort_w_1_1_1_1006/ckpt-last.pth

This result is much lower than the 88.9% reported in the paper "PoinTramba: A Hybrid Transformer-Mamba Framework for Point Cloud Analysis". Could you please help me understand why there is such a discrepancy?

xiaoyao3302 commented 4 months ago

Hi,

Can you provide me with the full training log and the config files? That might help to figure out the problem.

Best wishes, Zicheng


发件人: LiJin @.> 发送时间: 2024年6月16日 18:47 收件人: xiaoyao3302/PoinTramba @.> 抄送: Subscribed @.***> 主题: [xiaoyao3302/PoinTramba] Lower Reproduction Result for ScanObjectNN PB-T50-RS Compared to the Paper (Issue #1)

Thank you for your great work! I have followed all the instructions in the README to reproduce the code but achieved only an 84.2% accuracy on ScanObjectNN PB-T50-RS (the hardest):

98%|€€€€€€€€€€€€€€€€€€€€€€€€€€€€€€€€€€€€‡| 294/301 [22:59:31<33:44, 289.15s/it]2024-06-15 23:12:37,635 - finetune_scan_hardest_lr3_group_size16 - INFO - [Training] EPOCH: 294 EpochTime = 262.990 (s) Losses = ['0.7716', '99.9737'] lr = 0.000010 2024-06-15 23:13:02,389 - finetune_scan_hardest_lr3_group_size16 - INFO - [Validation] EPOCH: 294 acc = 84.2123 2024-06-15 23:13:03,371 - finetune_scan_hardest_lr3_group_size16 - INFO - Save checkpoint at ./experiments/finetune_scan_hardest_lr3_group_size16/cfgs/new_run_Attention_simple_score_predictor_no_cls_token_4layer_drop_neg_both_sort_w_1_1_1_1006/ckpt-best.pth 2024-06-15 23:13:03,371 - finetune_scan_hardest_lr3_group_size16 - INFO - -------------------------------------------------------------------------------------------- 2024-06-15 23:13:04,158 - finetune_scan_hardest_lr3_group_size16 - INFO - Save checkpoint at ./experiments/finetune_scan_hardest_lr3_group_size16/cfgs/new_run_Attention_simple_score_predictor_no_cls_token_4layer_drop_neg_both_sort_w_1_1_1_1006/ckpt-last.pth

This result is much lower than the 88.9% reported in the paper "PoinTramba: A Hybrid Transformer-Mamba Framework for Point Cloud Analysis". Could you please help me understand why there is such a discrepancy?

― Reply to this email directly, view it on GitHubhttps://github.com/xiaoyao3302/PoinTramba/issues/1, or unsubscribehttps://github.com/notifications/unsubscribe-auth/AKWDFA7LITOAXJ3UAOD7CQ3ZHVULZAVCNFSM6AAAAABJMPYCACVHI2DSMVQWIX3LMV43ASLTON2WKOZSGM2TKNRVHE3DKOI. You are receiving this because you are subscribed to this thread.Message ID: @.***>

LiJin73 commented 4 months ago

Thanks for your reply! I used the exact same config files provided in the repository, and I have attached these files along with the saved config.yaml and the last one of the training log files for your reference (.yaml is replaced to .txt for uploading).
20240615_000840.log config.txt finetune_scan_hardest_lr3_group_size16.txt ScanObjectNN_hardest.txt

Besides, the command I executed is obtained from the provided run.sh as: CUDA_VISIBLE_DEVICES=0 python main.py --scratch_model --attention_depth 4 --w_CE 1.0 --use_simple_score_predictor --mode_group 'Attention' --type_pooling 'important' --type_weighting 'drop_neg' --mode_sort 'both' --seed 1006 --config cfgs/finetune_scan_hardest_lr3_group_size16.yaml --exp_name new_run_Attention_simple_score_predictor_no_cls_token_4layer_drop_neg_both_sort_w_1_1_1_1006

xiaoyao3302 commented 4 months ago

Thanks! I'm checking the files and I'm rerunning the code now. If there is anything wrong I noticed, I will give you the feedback ASAP.

xiaoyao3302 commented 4 months ago

Hi~Sorry for my mistake. I'm now rerunning the codes, and I found that I commented a couple of lines in the runner_finetune.py to do the ablation studies (line 212 to 215), but I forgot to uncomment them. After modifying the mistakes, currently, we have run the codes for 50 epochs and the performance is around 85. In case you are worried about the modifications, I have just uploaded the modified runner_finetune.py file, please refer to the codes for more details. I will keep running the codes to the end, if I find any other mistakes, I will modify the repo ASAP. Also note that the results for ScanObjNN hardest w/o using data augmentation should be 84.5 (the results for ScanObjNN hardest w/ using data augmentation, i.e., 89.1, is correct). The paper was finished in a hurry. I'm terribly sorry for the misleading and the mistakes. I will update the Arxiv paper ASAP, and I will inform the reviewers of mistakes during the rebuttal period. Sorry again for wasting your time and thanks for the understanding.


发件人: LiJin @.> 发送时间: 2024年6月16日 19:23 收件人: xiaoyao3302/PoinTramba @.> 抄送: Edmond Skywalker @.>; Comment @.> 主题: Re: [xiaoyao3302/PoinTramba] Lower Reproduction Result for ScanObjectNN PB-T50-RS Compared to the Paper (Issue #1)

Thanks for your reply! I used the exact same config files provided in the repository, and I have attached these files along with the saved config.yaml and the last one of the training log files for your reference (.yaml is replaced to .txt for uploading). 20240615_000840.loghttps://github.com/user-attachments/files/15857554/20240615_000840.log config.txthttps://github.com/user-attachments/files/15857555/config.txt finetune_scan_hardest_lr3_group_size16.txthttps://github.com/user-attachments/files/15857556/finetune_scan_hardest_lr3_group_size16.txt ScanObjectNN_hardest.txthttps://github.com/user-attachments/files/15857557/ScanObjectNN_hardest.txt

Besides, the command I executed is obtained from the provided run.sh as: CUDA_VISIBLE_DEVICES=0 python main.py --scratch_model --attention_depth 4 --w_CE 1.0 --use_simple_score_predictor --mode_group 'Attention' --type_pooling 'important' --type_weighting 'drop_neg' --mode_sort 'both' --seed 1006 --config cfgs/finetune_scan_hardest_lr3_group_size16.yaml --exp_name new_run_Attention_simple_score_predictor_no_cls_token_4layer_drop_neg_both_sort_w_1_1_1_1006

― Reply to this email directly, view it on GitHubhttps://github.com/xiaoyao3302/PoinTramba/issues/1#issuecomment-2171446379, or unsubscribehttps://github.com/notifications/unsubscribe-auth/AKWDFA47AHEBWP6NIQEYZ4TZHVYSZAVCNFSM6AAAAABJMPYCACVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDCNZRGQ2DMMZXHE. You are receiving this because you commented.Message ID: @.***>

xiaoyao3302 commented 4 months ago

Hi, after a 300-epoch training process, the acc on ScanObjNN hardest is 88.5%, slightly lower than the results we reported but within a reasonable range, I think there is no longer an obvious problem with the code so far.


发件人: LiJin @.> 发送时间: 2024年6月16日 19:23 收件人: xiaoyao3302/PoinTramba @.> 抄送: Edmond Skywalker @.>; Comment @.> 主题: Re: [xiaoyao3302/PoinTramba] Lower Reproduction Result for ScanObjectNN PB-T50-RS Compared to the Paper (Issue #1)

Thanks for your reply! I used the exact same config files provided in the repository, and I have attached these files along with the saved config.yaml and the last one of the training log files for your reference (.yaml is replaced to .txt for uploading). 20240615_000840.loghttps://github.com/user-attachments/files/15857554/20240615_000840.log config.txthttps://github.com/user-attachments/files/15857555/config.txt finetune_scan_hardest_lr3_group_size16.txthttps://github.com/user-attachments/files/15857556/finetune_scan_hardest_lr3_group_size16.txt ScanObjectNN_hardest.txthttps://github.com/user-attachments/files/15857557/ScanObjectNN_hardest.txt

Besides, the command I executed is obtained from the provided run.sh as: CUDA_VISIBLE_DEVICES=0 python main.py --scratch_model --attention_depth 4 --w_CE 1.0 --use_simple_score_predictor --mode_group 'Attention' --type_pooling 'important' --type_weighting 'drop_neg' --mode_sort 'both' --seed 1006 --config cfgs/finetune_scan_hardest_lr3_group_size16.yaml --exp_name new_run_Attention_simple_score_predictor_no_cls_token_4layer_drop_neg_both_sort_w_1_1_1_1006

― Reply to this email directly, view it on GitHubhttps://github.com/xiaoyao3302/PoinTramba/issues/1#issuecomment-2171446379, or unsubscribehttps://github.com/notifications/unsubscribe-auth/AKWDFA47AHEBWP6NIQEYZ4TZHVYSZAVCNFSM6AAAAABJMPYCACVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDCNZRGQ2DMMZXHE. You are receiving this because you commented.Message ID: @.***>

2024-06-16 19:38:02,733 - finetune_scan_hardest_lr3_group_size16 - INFO - Copy the Config file from cfgs/finetune_scan_hardest_lr3_group_size16.yaml to ./experiments/finetune_scan_hardest_lr3_group_size16/cfgs/new_run_Attention_simple_score_predictor_no_cls_token_4layer_drop_neg_both_sort_128/config.yaml 2024-06-16 19:38:02,733 - finetune_scan_hardest_lr3_group_size16 - INFO - args.w_CE : 1.0 2024-06-16 19:38:02,733 - finetune_scan_hardest_lr3_group_size16 - INFO - args.w_GLR : 1.0 2024-06-16 19:38:02,734 - finetune_scan_hardest_lr3_group_size16 - INFO - args.w_importance : 1.0 2024-06-16 19:38:02,734 - finetune_scan_hardest_lr3_group_size16 - INFO - args.type_pooling : important 2024-06-16 19:38:02,745 - finetune_scan_hardest_lr3_group_size16 - INFO - args.type_weighting : drop_neg 2024-06-16 19:38:02,745 - finetune_scan_hardest_lr3_group_size16 - INFO - args.detach_mapping : False 2024-06-16 19:38:02,745 - finetune_scan_hardest_lr3_group_size16 - INFO - args.detach_score_prediction : False 2024-06-16 19:38:02,745 - finetune_scan_hardest_lr3_group_size16 - INFO - args.mode_sort : both 2024-06-16 19:38:02,745 - finetune_scan_hardest_lr3_group_size16 - INFO - args.mode_group : Attention 2024-06-16 19:38:02,745 - finetune_scan_hardest_lr3_group_size16 - INFO - args.mode_patch_feature : cat_sort 2024-06-16 19:38:02,745 - finetune_scan_hardest_lr3_group_size16 - INFO - args.mode_encoder : Mamba 2024-06-16 19:38:02,745 - finetune_scan_hardest_lr3_group_size16 - INFO - args.use_importance_order : False 2024-06-16 19:38:02,745 - finetune_scan_hardest_lr3_group_size16 - INFO - args.use_xyz_order : False 2024-06-16 19:38:02,745 - finetune_scan_hardest_lr3_group_size16 - INFO - args.use_map_order : False 2024-06-16 19:38:02,745 - finetune_scan_hardest_lr3_group_size16 - INFO - args.use_simple_score_predictor : True 2024-06-16 19:38:02,745 - finetune_scan_hardest_lr3_group_size16 - INFO - args.attention_use_cls_token : False 2024-06-16 19:38:02,745 - finetune_scan_hardest_lr3_group_size16 - INFO - args.attention_depth : 4 2024-06-16 19:38:02,745 - finetune_scan_hardest_lr3_group_size16 - INFO - args.attention_drop_path_rate : 0.1 2024-06-16 19:38:02,745 - finetune_scan_hardest_lr3_group_size16 - INFO - args.attention_num_heads : 6 2024-06-16 19:38:02,745 - finetune_scan_hardest_lr3_group_size16 - INFO - args.Transformer_encoder_num_heads : 6 2024-06-16 19:38:02,745 - finetune_scan_hardest_lr3_group_size16 - INFO - args.SA_attention_use_cls_token : False 2024-06-16 19:38:02,745 - finetune_scan_hardest_lr3_group_size16 - INFO - args.use_logits_sfm : False 2024-06-16 19:38:02,745 - finetune_scan_hardest_lr3_group_size16 - INFO - args.use_vote : False 2024-06-16 19:38:02,745 - finetune_scan_hardest_lr3_group_size16 - INFO - args.config : cfgs/finetune_scan_hardest_lr3_group_size16.yaml 2024-06-16 19:38:02,745 - finetune_scan_hardest_lr3_group_size16 - INFO - args.launcher : none 2024-06-16 19:38:02,745 - finetune_scan_hardest_lr3_group_size16 - INFO - args.local_rank : 0 2024-06-16 19:38:02,745 - finetune_scan_hardest_lr3_group_size16 - INFO - args.num_workers : 8 2024-06-16 19:38:02,745 - finetune_scan_hardest_lr3_group_size16 - INFO - args.seed : 128 2024-06-16 19:38:02,746 - finetune_scan_hardest_lr3_group_size16 - INFO - args.deterministic : False 2024-06-16 19:38:02,746 - finetune_scan_hardest_lr3_group_size16 - INFO - args.sync_bn : False 2024-06-16 19:38:02,746 - finetune_scan_hardest_lr3_group_size16 - INFO - args.exp_name : new_run_Attention_simple_score_predictor_no_cls_token_4layer_drop_neg_both_sort_128 2024-06-16 19:38:02,746 - finetune_scan_hardest_lr3_group_size16 - INFO - args.loss : cd1 2024-06-16 19:38:02,746 - finetune_scan_hardest_lr3_group_size16 - INFO - args.start_ckpts : None 2024-06-16 19:38:02,746 - finetune_scan_hardest_lr3_group_size16 - INFO - args.ckpts : None 2024-06-16 19:38:02,746 - finetune_scan_hardest_lr3_group_size16 - INFO - args.val_freq : 1 2024-06-16 19:38:02,746 - finetune_scan_hardest_lr3_group_size16 - INFO - args.vote : False 2024-06-16 19:38:02,746 - finetune_scan_hardest_lr3_group_size16 - INFO - args.vis : False 2024-06-16 19:38:02,746 - finetune_scan_hardest_lr3_group_size16 - INFO - args.resume : False 2024-06-16 19:38:02,746 - finetune_scan_hardest_lr3_group_size16 - INFO - args.test : False 2024-06-16 19:38:02,746 - finetune_scan_hardest_lr3_group_size16 - INFO - args.finetune_model : False 2024-06-16 19:38:02,746 - finetune_scan_hardest_lr3_group_size16 - INFO - args.scratch_model : True 2024-06-16 19:38:02,746 - finetune_scan_hardest_lr3_group_size16 - INFO - args.mode : None 2024-06-16 19:38:02,746 - finetune_scan_hardest_lr3_group_size16 - INFO - args.way : -1 2024-06-16 19:38:02,746 - finetune_scan_hardest_lr3_group_size16 - INFO - args.shot : -1 2024-06-16 19:38:02,746 - finetune_scan_hardest_lr3_group_size16 - INFO - args.fold : -1 2024-06-16 19:38:02,746 - finetune_scan_hardest_lr3_group_size16 - INFO - args.experiment_path : ./experiments/finetune_scan_hardest_lr3_group_size16/cfgs/new_run_Attention_simple_score_predictor_no_cls_token_4layer_drop_neg_both_sort_128 2024-06-16 19:38:02,746 - finetune_scan_hardest_lr3_group_size16 - INFO - args.tfboard_path : ./experiments/finetune_scan_hardest_lr3_group_size16/cfgs/TFBoard/new_run_Attention_simple_score_predictor_no_cls_token_4layer_drop_neg_both_sort_128 2024-06-16 19:38:02,746 - finetune_scan_hardest_lr3_group_size16 - INFO - args.log_name : finetune_scan_hardest_lr3_group_size16 2024-06-16 19:38:02,746 - finetune_scan_hardest_lr3_group_size16 - INFO - args.use_gpu : True 2024-06-16 19:38:02,746 - finetune_scan_hardest_lr3_group_size16 - INFO - args.distributed : False 2024-06-16 19:38:02,746 - finetune_scan_hardest_lr3_group_size16 - INFO - config.optimizer = edict() 2024-06-16 19:38:02,746 - finetune_scan_hardest_lr3_group_size16 - INFO - config.optimizer.type : AdamW 2024-06-16 19:38:02,746 - finetune_scan_hardest_lr3_group_size16 - INFO - config.optimizer.kwargs = edict() 2024-06-16 19:38:02,746 - finetune_scan_hardest_lr3_group_size16 - INFO - config.optimizer.kwargs.lr : 0.0003 2024-06-16 19:38:02,746 - finetune_scan_hardest_lr3_group_size16 - INFO - config.optimizer.kwargs.weight_decay : 0.05 2024-06-16 19:38:02,746 - finetune_scan_hardest_lr3_group_size16 - INFO - config.scheduler = edict() 2024-06-16 19:38:02,746 - finetune_scan_hardest_lr3_group_size16 - INFO - config.scheduler.type : CosLR 2024-06-16 19:38:02,747 - finetune_scan_hardest_lr3_group_size16 - INFO - config.scheduler.kwargs = edict() 2024-06-16 19:38:02,747 - finetune_scan_hardest_lr3_group_size16 - INFO - config.scheduler.kwargs.epochs : 300 2024-06-16 19:38:02,747 - finetune_scan_hardest_lr3_group_size16 - INFO - config.scheduler.kwargs.initial_epochs : 10 2024-06-16 19:38:02,747 - finetune_scan_hardest_lr3_group_size16 - INFO - config.dataset = edict() 2024-06-16 19:38:02,747 - finetune_scan_hardest_lr3_group_size16 - INFO - config.dataset.train = edict() 2024-06-16 19:38:02,747 - finetune_scan_hardest_lr3_group_size16 - INFO - config.dataset.train.base = edict() 2024-06-16 19:38:02,747 - finetune_scan_hardest_lr3_group_size16 - INFO - config.dataset.train.base.NAME : ScanObjectNN_hardest 2024-06-16 19:38:02,747 - finetune_scan_hardest_lr3_group_size16 - INFO - config.dataset.train.base.ROOT : /mnt/dongxu-fs1/data-ssd/wangzicheng/data1/ScanObjNN/h5_files/main_split 2024-06-16 19:38:02,747 - finetune_scan_hardest_lr3_group_size16 - INFO - config.dataset.train.others = edict() 2024-06-16 19:38:02,747 - finetune_scan_hardest_lr3_group_size16 - INFO - config.dataset.train.others.subset : train 2024-06-16 19:38:02,747 - finetune_scan_hardest_lr3_group_size16 - INFO - config.dataset.train.others.bs : 32 2024-06-16 19:38:02,747 - finetune_scan_hardest_lr3_group_size16 - INFO - config.dataset.val = edict() 2024-06-16 19:38:02,747 - finetune_scan_hardest_lr3_group_size16 - INFO - config.dataset.val.base = edict() 2024-06-16 19:38:02,747 - finetune_scan_hardest_lr3_group_size16 - INFO - config.dataset.val.base.NAME : ScanObjectNN_hardest 2024-06-16 19:38:02,747 - finetune_scan_hardest_lr3_group_size16 - INFO - config.dataset.val.base.ROOT : /mnt/dongxu-fs1/data-ssd/wangzicheng/data1/ScanObjNN/h5_files/main_split 2024-06-16 19:38:02,747 - finetune_scan_hardest_lr3_group_size16 - INFO - config.dataset.val.others = edict() 2024-06-16 19:38:02,747 - finetune_scan_hardest_lr3_group_size16 - INFO - config.dataset.val.others.subset : test 2024-06-16 19:38:02,747 - finetune_scan_hardest_lr3_group_size16 - INFO - config.dataset.val.others.bs : 64 2024-06-16 19:38:02,747 - finetune_scan_hardest_lr3_group_size16 - INFO - config.dataset.test = edict() 2024-06-16 19:38:02,747 - finetune_scan_hardest_lr3_group_size16 - INFO - config.dataset.test.base = edict() 2024-06-16 19:38:02,747 - finetune_scan_hardest_lr3_group_size16 - INFO - config.dataset.test.base.NAME : ScanObjectNN_hardest 2024-06-16 19:38:02,747 - finetune_scan_hardest_lr3_group_size16 - INFO - config.dataset.test.base.ROOT : /mnt/dongxu-fs1/data-ssd/wangzicheng/data1/ScanObjNN/h5_files/main_split 2024-06-16 19:38:02,747 - finetune_scan_hardest_lr3_group_size16 - INFO - config.dataset.test.others = edict() 2024-06-16 19:38:02,747 - finetune_scan_hardest_lr3_group_size16 - INFO - config.dataset.test.others.subset : test 2024-06-16 19:38:02,747 - finetune_scan_hardest_lr3_group_size16 - INFO - config.dataset.test.others.bs : 32 2024-06-16 19:38:02,747 - finetune_scan_hardest_lr3_group_size16 - INFO - config.model = edict() 2024-06-16 19:38:02,747 - finetune_scan_hardest_lr3_group_size16 - INFO - config.model.NAME : PointMambaFormer 2024-06-16 19:38:02,747 - finetune_scan_hardest_lr3_group_size16 - INFO - config.model.trans_dim : 384 2024-06-16 19:38:02,747 - finetune_scan_hardest_lr3_group_size16 - INFO - config.model.depth : 12 2024-06-16 19:38:02,748 - finetune_scan_hardest_lr3_group_size16 - INFO - config.model.cls_dim : 15 2024-06-16 19:38:02,748 - finetune_scan_hardest_lr3_group_size16 - INFO - config.model.num_heads : 6 2024-06-16 19:38:02,748 - finetune_scan_hardest_lr3_group_size16 - INFO - config.model.group_size : 16 2024-06-16 19:38:02,748 - finetune_scan_hardest_lr3_group_size16 - INFO - config.model.num_group : 256 2024-06-16 19:38:02,748 - finetune_scan_hardest_lr3_group_size16 - INFO - config.model.encoder_dims : 384 2024-06-16 19:38:02,748 - finetune_scan_hardest_lr3_group_size16 - INFO - config.model.rms_norm : False 2024-06-16 19:38:02,748 - finetune_scan_hardest_lr3_group_size16 - INFO - config.model.drop_path : 0.1 2024-06-16 19:38:02,748 - finetune_scan_hardest_lr3_group_size16 - INFO - config.model.drop_out : 0.0 2024-06-16 19:38:02,748 - finetune_scan_hardest_lr3_group_size16 - INFO - config.model.type_pooling : important 2024-06-16 19:38:02,748 - finetune_scan_hardest_lr3_group_size16 - INFO - config.model.type_weighting : drop_neg 2024-06-16 19:38:02,748 - finetune_scan_hardest_lr3_group_size16 - INFO - config.model.detach_mapping : False 2024-06-16 19:38:02,748 - finetune_scan_hardest_lr3_group_size16 - INFO - config.model.detach_score_prediction : False 2024-06-16 19:38:02,748 - finetune_scan_hardest_lr3_group_size16 - INFO - config.model.mode_sort : both 2024-06-16 19:38:02,748 - finetune_scan_hardest_lr3_group_size16 - INFO - config.model.mode_group : Attention 2024-06-16 19:38:02,748 - finetune_scan_hardest_lr3_group_size16 - INFO - config.model.mode_encoder : Mamba 2024-06-16 19:38:02,748 - finetune_scan_hardest_lr3_group_size16 - INFO - config.model.Transformer_encoder_num_heads : 6 2024-06-16 19:38:02,748 - finetune_scan_hardest_lr3_group_size16 - INFO - config.model.use_importance_order : False 2024-06-16 19:38:02,748 - finetune_scan_hardest_lr3_group_size16 - INFO - config.model.use_xyz_order : False 2024-06-16 19:38:02,748 - finetune_scan_hardest_lr3_group_size16 - INFO - config.model.use_map_order : False 2024-06-16 19:38:02,748 - finetune_scan_hardest_lr3_group_size16 - INFO - config.model.use_simple_score_predictor : True 2024-06-16 19:38:02,748 - finetune_scan_hardest_lr3_group_size16 - INFO - config.model.attention_use_cls_token : False 2024-06-16 19:38:02,748 - finetune_scan_hardest_lr3_group_size16 - INFO - config.model.attention_depth : 4 2024-06-16 19:38:02,748 - finetune_scan_hardest_lr3_group_size16 - INFO - config.model.attention_drop_path_rate : 0.1 2024-06-16 19:38:02,748 - finetune_scan_hardest_lr3_group_size16 - INFO - config.model.attention_num_heads : 6 2024-06-16 19:38:02,748 - finetune_scan_hardest_lr3_group_size16 - INFO - config.model.mode_patch_feature : cat_sort 2024-06-16 19:38:02,748 - finetune_scan_hardest_lr3_group_size16 - INFO - config.model.use_logits_sfm : False 2024-06-16 19:38:02,748 - finetune_scan_hardest_lr3_group_size16 - INFO - config.npoints : 2048 2024-06-16 19:38:02,748 - finetune_scan_hardest_lr3_group_size16 - INFO - config.total_bs : 32 2024-06-16 19:38:02,748 - finetune_scan_hardest_lr3_group_size16 - INFO - config.step_per_update : 1 2024-06-16 19:38:02,748 - finetune_scan_hardest_lr3_group_size16 - INFO - config.max_epoch : 300 2024-06-16 19:38:02,749 - finetune_scan_hardest_lr3_group_size16 - INFO - config.grad_norm_clip : 10 2024-06-16 19:38:02,749 - finetune_scan_hardest_lr3_group_size16 - INFO - Distributed training: False 2024-06-16 19:38:02,749 - finetune_scan_hardest_lr3_group_size16 - INFO - Set random seed to 128, deterministic: False 2024-06-16 19:38:06,382 - finetune_scan_hardest_lr3_group_size16 - INFO - Training from scratch 2024-06-16 19:38:07,744 - finetune_scan_hardest_lr3_group_size16 - INFO - Using Data parallel ... 2024-06-16 19:38:07,750 - finetune_scan_hardest_lr3_group_size16 - INFO - >> Trainable Parameters: 2024-06-16 19:38:07,753 - finetune_scan_hardest_lr3_group_size16 - INFO - ---------------------------------------------------------------------------------------------------------- 2024-06-16 19:38:07,753 - finetune_scan_hardest_lr3_group_size16 - INFO - |Name |Dtype |Shape |#Params | 2024-06-16 19:38:07,753 - finetune_scan_hardest_lr3_group_size16 - INFO - ---------------------------------------------------------------------------------------------------------- 2024-06-16 19:38:07,753 - finetune_scan_hardest_lr3_group_size16 - INFO - |module.group_divider.cls_token |torch.float32 |(1, 1, 384) |384 | 2024-06-16 19:38:07,754 - finetune_scan_hardest_lr3_group_size16 - INFO - ---------------------------------------------------------------------------------------------------------- 2024-06-16 19:38:07,754 - finetune_scan_hardest_lr3_group_size16 - INFO - |module.group_divider.cls_pos |torch.float32 |(1, 1, 384) |384 | 2024-06-16 19:38:07,754 - finetune_scan_hardest_lr3_group_size16 - INFO - ---------------------------------------------------------------------------------------------------------- 2024-06-16 19:38:07,754 - finetune_scan_hardest_lr3_group_size16 - INFO - |module.group_divider.embedding.0.weight |torch.float32 |(128, 3) |384 | 2024-06-16 19:38:07,754 - finetune_scan_hardest_lr3_group_size16 - INFO - ---------------------------------------------------------------------------------------------------------- 2024-06-16 19:38:07,754 - finetune_scan_hardest_lr3_group_size16 - INFO - |module.group_divider.embedding.0.bias |torch.float32 |(128,) |128 | 2024-06-16 19:38:07,754 - finetune_scan_hardest_lr3_group_size16 - INFO - ---------------------------------------------------------------------------------------------------------- 2024-06-16 19:38:07,754 - finetune_scan_hardest_lr3_group_size16 - INFO - |module.group_divider.embedding.2.weight |torch.float32 |(384, 128) |49152 | 2024-06-16 19:38:07,754 - finetune_scan_hardest_lr3_group_size16 - INFO - ---------------------------------------------------------------------------------------------------------- 2024-06-16 19:38:07,754 - finetune_scan_hardest_lr3_group_size16 - INFO - |module.group_divider.embedding.2.bias |torch.float32 |(384,) |384 | 2024-06-16 19:38:07,754 - finetune_scan_hardest_lr3_group_size16 - INFO - ---------------------------------------------------------------------------------------------------------- 2024-06-16 19:38:07,754 - finetune_scan_hardest_lr3_group_size16 - INFO - |module.group_divider.pos_embed.0.weight |torch.float32 |(128, 3) |384 | 2024-06-16 19:38:07,754 - finetune_scan_hardest_lr3_group_size16 - INFO - ---------------------------------------------------------------------------------------------------------- 2024-06-16 19:38:07,754 - finetune_scan_hardest_lr3_group_size16 - INFO - |module.group_divider.pos_embed.0.bias |torch.float32 |(128,) |128 | 2024-06-16 19:38:07,754 - finetune_scan_hardest_lr3_group_size16 - INFO - ---------------------------------------------------------------------------------------------------------- 2024-06-16 19:38:07,754 - finetune_scan_hardest_lr3_group_size16 - INFO - |module.group_divider.pos_embed.2.weight |torch.float32 |(384, 128) |49152 | 2024-06-16 19:38:07,754 - finetune_scan_hardest_lr3_group_size16 - INFO - ---------------------------------------------------------------------------------------------------------- 2024-06-16 19:38:07,754 - finetune_scan_hardest_lr3_group_size16 - INFO - |module.group_divider.pos_embed.2.bias |torch.float32 |(384,) |384 | 2024-06-16 19:38:07,754 - finetune_scan_hardest_lr3_group_size16 - INFO - ---------------------------------------------------------------------------------------------------------- 2024-06-16 19:38:07,754 - finetune_scan_hardest_lr3_group_size16 - INFO - |module.group_divider.blocks.blocks.0.norm1.weight |torch.float32 |(384,) |384 | 2024-06-16 19:38:07,754 - finetune_scan_hardest_lr3_group_size16 - INFO - ---------------------------------------------------------------------------------------------------------- 2024-06-16 19:38:07,754 - finetune_scan_hardest_lr3_group_size16 - INFO - |module.group_divider.blocks.blocks.0.norm1.bias |torch.float32 |(384,) |384 | 2024-06-16 19:38:07,754 - finetune_scan_hardest_lr3_group_size16 - INFO - ---------------------------------------------------------------------------------------------------------- 2024-06-16 19:38:07,754 - finetune_scan_hardest_lr3_group_size16 - INFO - |module.group_divider.blocks.blocks.0.norm2.weight |torch.float32 |(384,) |384 | 2024-06-16 19:38:07,754 - finetune_scan_hardest_lr3_group_size16 - INFO - ---------------------------------------------------------------------------------------------------------- 2024-06-16 19:38:07,754 - finetune_scan_hardest_lr3_group_size16 - INFO - |module.group_divider.blocks.blocks.0.norm2.bias |torch.float32 |(384,) |384 | 2024-06-16 19:38:07,754 - finetune_scan_hardest_lr3_group_size16 - INFO - ---------------------------------------------------------------------------------------------------------- 2024-06-16 19:38:07,754 - finetune_scan_hardest_lr3_group_size16 - INFO - |module.group_divider.blocks.blocks.0.mlp.fc1.weight |torch.float32 |(1536, 384) |589824 | 2024-06-16 19:38:07,754 - finetune_scan_hardest_lr3_group_size16 - INFO - ---------------------------------------------------------------------------------------------------------- 2024-06-16 19:38:07,754 - finetune_scan_hardest_lr3_group_size16 - INFO - |module.group_divider.blocks.blocks.0.mlp.fc1.bias |torch.float32 |(1536,) |1536 | 2024-06-16 19:38:07,755 - finetune_scan_hardest_lr3_group_size16 - INFO - ---------------------------------------------------------------------------------------------------------- 2024-06-16 19:38:07,755 - finetune_scan_hardest_lr3_group_size16 - INFO - |module.group_divider.blocks.blocks.0.mlp.fc2.weight |torch.float32 |(384, 1536) |589824 | 2024-06-16 19:38:07,755 - finetune_scan_hardest_lr3_group_size16 - INFO - ---------------------------------------------------------------------------------------------------------- 2024-06-16 19:38:07,755 - finetune_scan_hardest_lr3_group_size16 - INFO - |module.group_divider.blocks.blocks.0.mlp.fc2.bias |torch.float32 |(384,) |384 | 2024-06-16 19:38:07,755 - finetune_scan_hardest_lr3_group_size16 - INFO - ---------------------------------------------------------------------------------------------------------- 2024-06-16 19:38:07,755 - finetune_scan_hardest_lr3_group_size16 - INFO - |module.group_divider.blocks.blocks.0.attn.qkv.weight |torch.float32 |(1152, 384) |442368 | 2024-06-16 19:38:07,755 - finetune_scan_hardest_lr3_group_size16 - INFO - ---------------------------------------------------------------------------------------------------------- 2024-06-16 19:38:07,755 - finetune_scan_hardest_lr3_group_size16 - INFO - |module.group_divider.blocks.blocks.0.attn.proj.weight |torch.float32 |(384, 384) |147456 | 2024-06-16 19:38:07,755 - finetune_scan_hardest_lr3_group_size16 - INFO - ---------------------------------------------------------------------------------------------------------- 2024-06-16 19:38:07,755 - finetune_scan_hardest_lr3_group_size16 - INFO - |module.group_divider.blocks.blocks.0.attn.proj.bias |torch.float32 |(384,) |384 | 2024-06-16 19:38:07,755 - finetune_scan_hardest_lr3_group_size16 - INFO - ---------------------------------------------------------------------------------------------------------- 2024-06-16 19:38:07,755 - finetune_scan_hardest_lr3_group_size16 - INFO - |module.group_divider.blocks.blocks.1.norm1.weight |torch.float32 |(384,) |384 | 2024-06-16 19:38:07,755 - finetune_scan_hardest_lr3_group_size16 - INFO - ---------------------------------------------------------------------------------------------------------- 2024-06-16 19:38:07,755 - finetune_scan_hardest_lr3_group_size16 - INFO - |module.group_divider.blocks.blocks.1.norm1.bias |torch.float32 |(384,) |384 | 2024-06-16 19:38:07,755 - finetune_scan_hardest_lr3_group_size16 - INFO - ---------------------------------------------------------------------------------------------------------- 2024-06-16 19:38:07,755 - finetune_scan_hardest_lr3_group_size16 - INFO - |module.group_divider.blocks.blocks.1.norm2.weight |torch.float32 |(384,) |384 | 2024-06-16 19:38:07,755 - finetune_scan_hardest_lr3_group_size16 - INFO - ---------------------------------------------------------------------------------------------------------- 2024-06-16 19:38:07,755 - finetune_scan_hardest_lr3_group_size16 - INFO - |module.group_divider.blocks.blocks.1.norm2.bias |torch.float32 |(384,) |384 | 2024-06-16 19:38:07,755 - finetune_scan_hardest_lr3_group_size16 - INFO - ---------------------------------------------------------------------------------------------------------- 2024-06-16 19:38:07,755 - finetune_scan_hardest_lr3_group_size16 - INFO - |module.group_divider.blocks.blocks.1.mlp.fc1.weight |torch.float32 |(1536, 384) |589824 | 2024-06-16 19:38:07,755 - finetune_scan_hardest_lr3_group_size16 - INFO - ---------------------------------------------------------------------------------------------------------- 2024-06-16 19:38:07,755 - finetune_scan_hardest_lr3_group_size16 - INFO - |module.group_divider.blocks.blocks.1.mlp.fc1.bias |torch.float32 |(1536,) |1536 | 2024-06-16 19:38:07,755 - finetune_scan_hardest_lr3_group_size16 - INFO - ---------------------------------------------------------------------------------------------------------- 2024-06-16 19:38:07,755 - finetune_scan_hardest_lr3_group_size16 - INFO - |module.group_divider.blocks.blocks.1.mlp.fc2.weight |torch.float32 |(384, 1536) |589824 | 2024-06-16 19:38:07,755 - finetune_scan_hardest_lr3_group_size16 - INFO - ---------------------------------------------------------------------------------------------------------- 2024-06-16 19:38:07,755 - finetune_scan_hardest_lr3_group_size16 - INFO - |module.group_divider.blocks.blocks.1.mlp.fc2.bias |torch.float32 |(384,) |384 | 2024-06-16 19:38:07,755 - finetune_scan_hardest_lr3_group_size16 - INFO - ---------------------------------------------------------------------------------------------------------- 2024-06-16 19:38:07,755 - finetune_scan_hardest_lr3_group_size16 - INFO - |module.group_divider.blocks.blocks.1.attn.qkv.weight |torch.float32 |(1152, 384) |442368 | 2024-06-16 19:38:07,755 - finetune_scan_hardest_lr3_group_size16 - INFO - ---------------------------------------------------------------------------------------------------------- 2024-06-16 19:38:07,755 - finetune_scan_hardest_lr3_group_size16 - INFO - |module.group_divider.blocks.blocks.1.attn.proj.weight |torch.float32 |(384, 384) |147456 | 2024-06-16 19:38:07,756 - finetune_scan_hardest_lr3_group_size16 - INFO - ---------------------------------------------------------------------------------------------------------- 2024-06-16 19:38:07,756 - finetune_scan_hardest_lr3_group_size16 - INFO - |module.group_divider.blocks.blocks.1.attn.proj.bias |torch.float32 |(384,) |384 | 2024-06-16 19:38:07,756 - finetune_scan_hardest_lr3_group_size16 - INFO - ---------------------------------------------------------------------------------------------------------- 2024-06-16 19:38:07,756 - finetune_scan_hardest_lr3_group_size16 - INFO - |module.group_divider.blocks.blocks.2.norm1.weight |torch.float32 |(384,) |384 | 2024-06-16 19:38:07,756 - finetune_scan_hardest_lr3_group_size16 - INFO - ---------------------------------------------------------------------------------------------------------- 2024-06-16 19:38:07,756 - finetune_scan_hardest_lr3_group_size16 - INFO - |module.group_divider.blocks.blocks.2.norm1.bias |torch.float32 |(384,) |384 | 2024-06-16 19:38:07,756 - finetune_scan_hardest_lr3_group_size16 - INFO - ---------------------------------------------------------------------------------------------------------- 2024-06-16 19:38:07,756 - finetune_scan_hardest_lr3_group_size16 - INFO - |module.group_divider.blocks.blocks.2.norm2.weight |torch.float32 |(384,) |384 | 2024-06-16 19:38:07,756 - finetune_scan_hardest_lr3_group_size16 - INFO - ---------------------------------------------------------------------------------------------------------- 2024-06-16 19:38:07,756 - finetune_scan_hardest_lr3_group_size16 - INFO - |module.group_divider.blocks.blocks.2.norm2.bias |torch.float32 |(384,) |384 | 2024-06-16 19:38:07,756 - finetune_scan_hardest_lr3_group_size16 - INFO - ---------------------------------------------------------------------------------------------------------- 2024-06-16 19:38:07,756 - finetune_scan_hardest_lr3_group_size16 - INFO - |module.group_divider.blocks.blocks.2.mlp.fc1.weight |torch.float32 |(1536, 384) |589824 | 2024-06-16 19:38:07,756 - finetune_scan_hardest_lr3_group_size16 - INFO - ---------------------------------------------------------------------------------------------------------- 2024-06-16 19:38:07,756 - finetune_scan_hardest_lr3_group_size16 - INFO - |module.group_divider.blocks.blocks.2.mlp.fc1.bias |torch.float32 |(1536,) |1536 | 2024-06-16 19:38:07,756 - finetune_scan_hardest_lr3_group_size16 - INFO - ---------------------------------------------------------------------------------------------------------- 2024-06-16 19:38:07,756 - finetune_scan_hardest_lr3_group_size16 - INFO - |module.group_divider.blocks.blocks.2.mlp.fc2.weight |torch.float32 |(384, 1536) |589824 | 2024-06-16 19:38:07,756 - finetune_scan_hardest_lr3_group_size16 - INFO - ---------------------------------------------------------------------------------------------------------- 2024-06-16 19:38:07,756 - finetune_scan_hardest_lr3_group_size16 - INFO - |module.group_divider.blocks.blocks.2.mlp.fc2.bias |torch.float32 |(384,) |384 | 2024-06-16 19:38:07,756 - finetune_scan_hardest_lr3_group_size16 - INFO - ---------------------------------------------------------------------------------------------------------- 2024-06-16 19:38:07,756 - finetune_scan_hardest_lr3_group_size16 - INFO - |module.group_divider.blocks.blocks.2.attn.qkv.weight |torch.float32 |(1152, 384) |442368 | 2024-06-16 19:38:07,756 - finetune_scan_hardest_lr3_group_size16 - INFO - ---------------------------------------------------------------------------------------------------------- 2024-06-16 19:38:07,756 - finetune_scan_hardest_lr3_group_size16 - INFO - |module.group_divider.blocks.blocks.2.attn.proj.weight |torch.float32 |(384, 384) |147456 | 2024-06-16 19:38:07,756 - finetune_scan_hardest_lr3_group_size16 - INFO - ---------------------------------------------------------------------------------------------------------- 2024-06-16 19:38:07,756 - finetune_scan_hardest_lr3_group_size16 - INFO - |module.group_divider.blocks.blocks.2.attn.proj.bias |torch.float32 |(384,) |384 | 2024-06-16 19:38:07,756 - finetune_scan_hardest_lr3_group_size16 - INFO - ---------------------------------------------------------------------------------------------------------- 2024-06-16 19:38:07,756 - finetune_scan_hardest_lr3_group_size16 - INFO - |module.group_divider.blocks.blocks.3.norm1.weight |torch.float32 |(384,) |384 | 2024-06-16 19:38:07,756 - finetune_scan_hardest_lr3_group_size16 - INFO - ---------------------------------------------------------------------------------------------------------- 2024-06-16 19:38:07,756 - finetune_scan_hardest_lr3_group_size16 - INFO - |module.group_divider.blocks.blocks.3.norm1.bias |torch.float32 |(384,) |384 | 2024-06-16 19:38:07,756 - finetune_scan_hardest_lr3_group_size16 - INFO - ---------------------------------------------------------------------------------------------------------- 2024-06-16 19:38:07,756 - finetune_scan_hardest_lr3_group_size16 - INFO - |module.group_divider.blocks.blocks.3.norm2.weight |torch.float32 |(384,) |384 | 2024-06-16 19:38:07,757 - finetune_scan_hardest_lr3_group_size16 - INFO - ---------------------------------------------------------------------------------------------------------- 2024-06-16 19:38:07,757 - finetune_scan_hardest_lr3_group_size16 - INFO - |module.group_divider.blocks.blocks.3.norm2.bias |torch.float32 |(384,) |384 | 2024-06-16 19:38:07,757 - finetune_scan_hardest_lr3_group_size16 - INFO - ---------------------------------------------------------------------------------------------------------- 2024-06-16 19:38:07,757 - finetune_scan_hardest_lr3_group_size16 - INFO - |module.group_divider.blocks.blocks.3.mlp.fc1.weight |torch.float32 |(1536, 384) |589824 | 2024-06-16 19:38:07,757 - finetune_scan_hardest_lr3_group_size16 - INFO - ---------------------------------------------------------------------------------------------------------- 2024-06-16 19:38:07,757 - finetune_scan_hardest_lr3_group_size16 - INFO - |module.group_divider.blocks.blocks.3.mlp.fc1.bias |torch.float32 |(1536,) |1536 | 2024-06-16 19:38:07,757 - finetune_scan_hardest_lr3_group_size16 - INFO - ---------------------------------------------------------------------------------------------------------- 2024-06-16 19:38:07,757 - finetune_scan_hardest_lr3_group_size16 - INFO - |module.group_divider.blocks.blocks.3.mlp.fc2.weight |torch.float32 |(384, 1536) |589824 | 2024-06-16 19:38:07,757 - finetune_scan_hardest_lr3_group_size16 - INFO - ---------------------------------------------------------------------------------------------------------- 2024-06-16 19:38:07,757 - finetune_scan_hardest_lr3_group_size16 - INFO - |module.group_divider.blocks.blocks.3.mlp.fc2.bias |torch.float32 |(384,) |384 | 2024-06-16 19:38:07,757 - finetune_scan_hardest_lr3_group_size16 - INFO - ---------------------------------------------------------------------------------------------------------- 2024-06-16 19:38:07,757 - finetune_scan_hardest_lr3_group_size16 - INFO - |module.group_divider.blocks.blocks.3.attn.qkv.weight |torch.float32 |(1152, 384) |442368 | 2024-06-16 19:38:07,757 - finetune_scan_hardest_lr3_group_size16 - INFO - ---------------------------------------------------------------------------------------------------------- 2024-06-16 19:38:07,757 - finetune_scan_hardest_lr3_group_size16 - INFO - |module.group_divider.blocks.blocks.3.attn.proj.weight |torch.float32 |(384, 384) |147456 | 2024-06-16 19:38:07,757 - finetune_scan_hardest_lr3_group_size16 - INFO - ---------------------------------------------------------------------------------------------------------- 2024-06-16 19:38:07,757 - finetune_scan_hardest_lr3_group_size16 - INFO - |module.group_divider.blocks.blocks.3.attn.proj.bias |torch.float32 |(384,) |384 | 2024-06-16 19:38:07,757 - finetune_scan_hardest_lr3_group_size16 - INFO - ---------------------------------------------------------------------------------------------------------- 2024-06-16 19:38:07,757 - finetune_scan_hardest_lr3_group_size16 - INFO - |module.group_divider.norm.weight |torch.float32 |(384,) |384 | 2024-06-16 19:38:07,757 - finetune_scan_hardest_lr3_group_size16 - INFO - ---------------------------------------------------------------------------------------------------------- 2024-06-16 19:38:07,757 - finetune_scan_hardest_lr3_group_size16 - INFO - |module.group_divider.norm.bias |torch.float32 |(384,) |384 | 2024-06-16 19:38:07,757 - finetune_scan_hardest_lr3_group_size16 - INFO - ---------------------------------------------------------------------------------------------------------- 2024-06-16 19:38:07,757 - finetune_scan_hardest_lr3_group_size16 - INFO - |module.group_divider.reduce_dim.weight |torch.float32 |(384, 768) |294912 | 2024-06-16 19:38:07,757 - finetune_scan_hardest_lr3_group_size16 - INFO - ---------------------------------------------------------------------------------------------------------- 2024-06-16 19:38:07,757 - finetune_scan_hardest_lr3_group_size16 - INFO - |module.group_divider.reduce_dim.bias |torch.float32 |(384,) |384 | 2024-06-16 19:38:07,757 - finetune_scan_hardest_lr3_group_size16 - INFO - ---------------------------------------------------------------------------------------------------------- 2024-06-16 19:38:07,757 - finetune_scan_hardest_lr3_group_size16 - INFO - |module.importance_cal_block.projection.0.weight |torch.float32 |(128, 384, 1) |49152 | 2024-06-16 19:38:07,757 - finetune_scan_hardest_lr3_group_size16 - INFO - ---------------------------------------------------------------------------------------------------------- 2024-06-16 19:38:07,757 - finetune_scan_hardest_lr3_group_size16 - INFO - |module.importance_cal_block.projection.0.bias |torch.float32 |(128,) |128 | 2024-06-16 19:38:07,757 - finetune_scan_hardest_lr3_group_size16 - INFO - ---------------------------------------------------------------------------------------------------------- 2024-06-16 19:38:07,757 - finetune_scan_hardest_lr3_group_size16 - INFO - |module.importance_cal_block.projection.1.weight |torch.float32 |(128,) |128 | 2024-06-16 19:38:07,758 - finetune_scan_hardest_lr3_group_size16 - INFO - ---------------------------------------------------------------------------------------------------------- 2024-06-16 19:38:07,758 - finetune_scan_hardest_lr3_group_size16 - INFO - |module.importance_cal_block.projection.1.bias |torch.float32 |(128,) |128 | 2024-06-16 19:38:07,758 - finetune_scan_hardest_lr3_group_size16 - INFO - ---------------------------------------------------------------------------------------------------------- 2024-06-16 19:38:07,758 - finetune_scan_hardest_lr3_group_size16 - INFO - |module.importance_cal_block.projection.3.weight |torch.float32 |(256, 128, 1) |32768 | 2024-06-16 19:38:07,758 - finetune_scan_hardest_lr3_group_size16 - INFO - ---------------------------------------------------------------------------------------------------------- 2024-06-16 19:38:07,758 - finetune_scan_hardest_lr3_group_size16 - INFO - |module.importance_cal_block.projection.3.bias |torch.float32 |(256,) |256 | 2024-06-16 19:38:07,758 - finetune_scan_hardest_lr3_group_size16 - INFO - ---------------------------------------------------------------------------------------------------------- 2024-06-16 19:38:07,758 - finetune_scan_hardest_lr3_group_size16 - INFO - |module.importance_cal_block.cal_imp_score.0.weight |torch.float32 |(128, 384, 1) |49152 | 2024-06-16 19:38:07,758 - finetune_scan_hardest_lr3_group_size16 - INFO - ---------------------------------------------------------------------------------------------------------- 2024-06-16 19:38:07,758 - finetune_scan_hardest_lr3_group_size16 - INFO - |module.importance_cal_block.cal_imp_score.0.bias |torch.float32 |(128,) |128 | 2024-06-16 19:38:07,758 - finetune_scan_hardest_lr3_group_size16 - INFO - ---------------------------------------------------------------------------------------------------------- 2024-06-16 19:38:07,758 - finetune_scan_hardest_lr3_group_size16 - INFO - |module.importance_cal_block.cal_imp_score.1.weight |torch.float32 |(128,) |128 | 2024-06-16 19:38:07,758 - finetune_scan_hardest_lr3_group_size16 - INFO - ---------------------------------------------------------------------------------------------------------- 2024-06-16 19:38:07,758 - finetune_scan_hardest_lr3_group_size16 - INFO - |module.importance_cal_block.cal_imp_score.1.bias |torch.float32 |(128,) |128 | 2024-06-16 19:38:07,758 - finetune_scan_hardest_lr3_group_size16 - INFO - ---------------------------------------------------------------------------------------------------------- 2024-06-16 19:38:07,758 - finetune_scan_hardest_lr3_group_size16 - INFO - |module.importance_cal_block.cal_imp_score.3.weight |torch.float32 |(1, 128, 1) |128 | 2024-06-16 19:38:07,758 - finetune_scan_hardest_lr3_group_size16 - INFO - ---------------------------------------------------------------------------------------------------------- 2024-06-16 19:38:07,758 - finetune_scan_hardest_lr3_group_size16 - INFO - |module.importance_cal_block.cal_imp_score.3.bias |torch.float32 |(1,) |1 | 2024-06-16 19:38:07,758 - finetune_scan_hardest_lr3_group_size16 - INFO - ---------------------------------------------------------------------------------------------------------- 2024-06-16 19:38:07,758 - finetune_scan_hardest_lr3_group_size16 - INFO - |module.pos_embed.0.weight |torch.float32 |(128, 3) |384 | 2024-06-16 19:38:07,758 - finetune_scan_hardest_lr3_group_size16 - INFO - ---------------------------------------------------------------------------------------------------------- 2024-06-16 19:38:07,758 - finetune_scan_hardest_lr3_group_size16 - INFO - |module.pos_embed.0.bias |torch.float32 |(128,) |128 | 2024-06-16 19:38:07,758 - finetune_scan_hardest_lr3_group_size16 - INFO - ---------------------------------------------------------------------------------------------------------- 2024-06-16 19:38:07,758 - finetune_scan_hardest_lr3_group_size16 - INFO - |module.pos_embed.2.weight |torch.float32 |(384, 128) |49152 | 2024-06-16 19:38:07,758 - finetune_scan_hardest_lr3_group_size16 - INFO - ---------------------------------------------------------------------------------------------------------- 2024-06-16 19:38:07,758 - finetune_scan_hardest_lr3_group_size16 - INFO - |module.pos_embed.2.bias |torch.float32 |(384,) |384 | 2024-06-16 19:38:07,758 - finetune_scan_hardest_lr3_group_size16 - INFO - ---------------------------------------------------------------------------------------------------------- 2024-06-16 19:38:07,758 - finetune_scan_hardest_lr3_group_size16 - INFO - |module.blocks.layers.0.mixer.A_log |torch.float32 |(768, 16) |12288 | 2024-06-16 19:38:07,758 - finetune_scan_hardest_lr3_group_size16 - INFO - ---------------------------------------------------------------------------------------------------------- 2024-06-16 19:38:07,758 - finetune_scan_hardest_lr3_group_size16 - INFO - |module.blocks.layers.0.mixer.D |torch.float32 |(768,) |768 | 2024-06-16 19:38:07,759 - finetune_scan_hardest_lr3_group_size16 - INFO - ---------------------------------------------------------------------------------------------------------- 2024-06-16 19:38:07,759 - finetune_scan_hardest_lr3_group_size16 - INFO - |module.blocks.layers.0.mixer.in_proj.weight |torch.float32 |(1536, 384) |589824 | 2024-06-16 19:38:07,759 - finetune_scan_hardest_lr3_group_size16 - INFO - ---------------------------------------------------------------------------------------------------------- 2024-06-16 19:38:07,759 - finetune_scan_hardest_lr3_group_size16 - INFO - |module.blocks.layers.0.mixer.conv1d.weight |torch.float32 |(768, 1, 4) |3072 | 2024-06-16 19:38:07,759 - finetune_scan_hardest_lr3_group_size16 - INFO - ---------------------------------------------------------------------------------------------------------- 2024-06-16 19:38:07,759 - finetune_scan_hardest_lr3_group_size16 - INFO - |module.blocks.layers.0.mixer.conv1d.bias |torch.float32 |(768,) |768 | 2024-06-16 19:38:07,759 - finetune_scan_hardest_lr3_group_size16 - INFO - ---------------------------------------------------------------------------------------------------------- 2024-06-16 19:38:07,759 - finetune_scan_hardest_lr3_group_size16 - INFO - |module.blocks.layers.0.mixer.x_proj.weight |torch.float32 |(56, 768) |43008 | 2024-06-16 19:38:07,759 - finetune_scan_hardest_lr3_group_size16 - INFO - ---------------------------------------------------------------------------------------------------------- 2024-06-16 19:38:07,759 - finetune_scan_hardest_lr3_group_size16 - INFO - |module.blocks.layers.0.mixer.dt_proj.weight |torch.float32 |(768, 24) |18432 | 2024-06-16 19:38:07,759 - finetune_scan_hardest_lr3_group_size16 - INFO - ---------------------------------------------------------------------------------------------------------- 2024-06-16 19:38:07,759 - finetune_scan_hardest_lr3_group_size16 - INFO - |module.blocks.layers.0.mixer.dt_proj.bias |torch.float32 |(768,) |768 | 2024-06-16 19:38:07,759 - finetune_scan_hardest_lr3_group_size16 - INFO - ---------------------------------------------------------------------------------------------------------- 2024-06-16 19:38:07,759 - finetune_scan_hardest_lr3_group_size16 - INFO - |module.blocks.layers.0.mixer.out_proj.weight |torch.float32 |(384, 768) |294912 | 2024-06-16 19:38:07,759 - finetune_scan_hardest_lr3_group_size16 - INFO - ---------------------------------------------------------------------------------------------------------- 2024-06-16 19:38:07,759 - finetune_scan_hardest_lr3_group_size16 - INFO - |module.blocks.layers.0.norm.weight |torch.float32 |(384,) |384 | 2024-06-16 19:38:07,759 - finetune_scan_hardest_lr3_group_size16 - INFO - ---------------------------------------------------------------------------------------------------------- 2024-06-16 19:38:07,759 - finetune_scan_hardest_lr3_group_size16 - INFO - |module.blocks.layers.0.norm.bias |torch.float32 |(384,) |384 | 2024-06-16 19:38:07,759 - finetune_scan_hardest_lr3_group_size16 - INFO - ---------------------------------------------------------------------------------------------------------- 2024-06-16 19:38:07,759 - finetune_scan_hardest_lr3_group_size16 - INFO - |module.blocks.layers.1.mixer.A_log |torch.float32 |(768, 16) |12288 | 2024-06-16 19:38:07,759 - finetune_scan_hardest_lr3_group_size16 - INFO - ---------------------------------------------------------------------------------------------------------- 2024-06-16 19:38:07,759 - finetune_scan_hardest_lr3_group_size16 - INFO - |module.blocks.layers.1.mixer.D |torch.float32 |(768,) |768 | 2024-06-16 19:38:07,759 - finetune_scan_hardest_lr3_group_size16 - INFO - ---------------------------------------------------------------------------------------------------------- 2024-06-16 19:38:07,759 - finetune_scan_hardest_lr3_group_size16 - INFO - |module.blocks.layers.1.mixer.in_proj.weight |torch.float32 |(1536, 384) |589824 | 2024-06-16 19:38:07,759 - finetune_scan_hardest_lr3_group_size16 - INFO - ---------------------------------------------------------------------------------------------------------- 2024-06-16 19:38:07,759 - finetune_scan_hardest_lr3_group_size16 - INFO - |module.blocks.layers.1.mixer.conv1d.weight |torch.float32 |(768, 1, 4) |3072 | 2024-06-16 19:38:07,759 - finetune_scan_hardest_lr3_group_size16 - INFO - ---------------------------------------------------------------------------------------------------------- 2024-06-16 19:38:07,759 - finetune_scan_hardest_lr3_group_size16 - INFO - |module.blocks.layers.1.mixer.conv1d.bias |torch.float32 |(768,) |768 | 2024-06-16 19:38:07,759 - finetune_scan_hardest_lr3_group_size16 - INFO - ---------------------------------------------------------------------------------------------------------- 2024-06-16 19:38:07,759 - finetune_scan_hardest_lr3_group_size16 - INFO - |module.blocks.layers.1.mixer.x_proj.weight |torch.float32 |(56, 768) |43008 | 2024-06-16 19:38:07,759 - finetune_scan_hardest_lr3_group_size16 - INFO - ---------------------------------------------------------------------------------------------------------- 2024-06-16 19:38:07,760 - finetune_scan_hardest_lr3_group_size16 - INFO - |module.blocks.layers.1.mixer.dt_proj.weight |torch.float32 |(768, 24) |18432 | 2024-06-16 19:38:07,760 - finetune_scan_hardest_lr3_group_size16 - INFO - ---------------------------------------------------------------------------------------------------------- 2024-06-16 19:38:07,760 - finetune_scan_hardest_lr3_group_size16 - INFO - |module.blocks.layers.1.mixer.dt_proj.bias |torch.float32 |(768,) |768 | 2024-06-16 19:38:07,760 - finetune_scan_hardest_lr3_group_size16 - INFO - ---------------------------------------------------------------------------------------------------------- 2024-06-16 19:38:07,760 - finetune_scan_hardest_lr3_group_size16 - INFO - |module.blocks.layers.1.mixer.out_proj.weight |torch.float32 |(384, 768) |294912 | 2024-06-16 19:38:07,760 - finetune_scan_hardest_lr3_group_size16 - INFO - ---------------------------------------------------------------------------------------------------------- 2024-06-16 19:38:07,760 - finetune_scan_hardest_lr3_group_size16 - INFO - |module.blocks.layers.1.norm.weight |torch.float32 |(384,) |384 | 2024-06-16 19:38:07,760 - finetune_scan_hardest_lr3_group_size16 - INFO - ---------------------------------------------------------------------------------------------------------- 2024-06-16 19:38:07,760 - finetune_scan_hardest_lr3_group_size16 - INFO - |module.blocks.layers.1.norm.bias |torch.float32 |(384,) |384 | 2024-06-16 19:38:07,760 - finetune_scan_hardest_lr3_group_size16 - INFO - ---------------------------------------------------------------------------------------------------------- 2024-06-16 19:38:07,760 - finetune_scan_hardest_lr3_group_size16 - INFO - |module.blocks.layers.2.mixer.A_log |torch.float32 |(768, 16) |12288 | 2024-06-16 19:38:07,760 - finetune_scan_hardest_lr3_group_size16 - INFO - ---------------------------------------------------------------------------------------------------------- 2024-06-16 19:38:07,760 - finetune_scan_hardest_lr3_group_size16 - INFO - |module.blocks.layers.2.mixer.D |torch.float32 |(768,) |768 | 2024-06-16 19:38:07,760 - finetune_scan_hardest_lr3_group_size16 - INFO - ---------------------------------------------------------------------------------------------------------- 2024-06-16 19:38:07,760 - finetune_scan_hardest_lr3_group_size16 - INFO - |module.blocks.layers.2.mixer.in_proj.weight |torch.float32 |(1536, 384) |589824 | 2024-06-16 19:38:07,760 - finetune_scan_hardest_lr3_group_size16 - INFO - ---------------------------------------------------------------------------------------------------------- 2024-06-16 19:38:07,760 - finetune_scan_hardest_lr3_group_size16 - INFO - |module.blocks.layers.2.mixer.conv1d.weight |torch.float32 |(768, 1, 4) |3072 | 2024-06-16 19:38:07,760 - finetune_scan_hardest_lr3_group_size16 - INFO - ---------------------------------------------------------------------------------------------------------- 2024-06-16 19:38:07,760 - finetune_scan_hardest_lr3_group_size16 - INFO - |module.blocks.layers.2.mixer.conv1d.bias |torch.float32 |(768,) |768 | 2024-06-16 19:38:07,760 - finetune_scan_hardest_lr3_group_size16 - INFO - ---------------------------------------------------------------------------------------------------------- 2024-06-16 19:38:07,760 - finetune_scan_hardest_lr3_group_size16 - INFO - |module.blocks.layers.2.mixer.x_proj.weight |torch.float32 |(56, 768) |43008 | 2024-06-16 19:38:07,760 - finetune_scan_hardest_lr3_group_size16 - INFO - ---------------------------------------------------------------------------------------------------------- 2024-06-16 19:38:07,760 - finetune_scan_hardest_lr3_group_size16 - INFO - |module.blocks.layers.2.mixer.dt_proj.weight |torch.float32 |(768, 24) |18432 | 2024-06-16 19:38:07,760 - finetune_scan_hardest_lr3_group_size16 - INFO - ---------------------------------------------------------------------------------------------------------- 2024-06-16 19:38:07,760 - finetune_scan_hardest_lr3_group_size16 - INFO - |module.blocks.layers.2.mixer.dt_proj.bias |torch.float32 |(768,) |768 | 2024-06-16 19:38:07,760 - finetune_scan_hardest_lr3_group_size16 - INFO - ---------------------------------------------------------------------------------------------------------- 2024-06-16 19:38:07,760 - finetune_scan_hardest_lr3_group_size16 - INFO - |module.blocks.layers.2.mixer.out_proj.weight |torch.float32 |(384, 768) |294912 | 2024-06-16 19:38:07,760 - finetune_scan_hardest_lr3_group_size16 - INFO - ---------------------------------------------------------------------------------------------------------- 2024-06-16 19:38:07,760 - finetune_scan_hardest_lr3_group_size16 - INFO - |module.blocks.layers.2.norm.weight |torch.float32 |(384,) |384 | 2024-06-16 19:38:07,761 - finetune_scan_hardest_lr3_group_size16 - INFO - ---------------------------------------------------------------------------------------------------------- 2024-06-16 19:38:07,761 - finetune_scan_hardest_lr3_group_size16 - INFO - |module.blocks.layers.2.norm.bias |torch.float32 |(384,) |384 | 2024-06-16 19:38:07,761 - finetune_scan_hardest_lr3_group_size16 - INFO - ---------------------------------------------------------------------------------------------------------- 2024-06-16 19:38:07,761 - finetune_scan_hardest_lr3_group_size16 - INFO - |module.blocks.layers.3.mixer.A_log |torch.float32 |(768, 16) |12288 | 2024-06-16 19:38:07,761 - finetune_scan_hardest_lr3_group_size16 - INFO - ---------------------------------------------------------------------------------------------------------- 2024-06-16 19:38:07,761 - finetune_scan_hardest_lr3_group_size16 - INFO - |module.blocks.layers.3.mixer.D |torch.float32 |(768,) |768 | 2024-06-16 19:38:07,761 - finetune_scan_hardest_lr3_group_size16 - INFO - ---------------------------------------------------------------------------------------------------------- 2024-06-16 19:38:07,761 - finetune_scan_hardest_lr3_group_size16 - INFO - |module.blocks.layers.3.mixer.in_proj.weight |torch.float32 |(1536, 384) |589824 | 2024-06-16 19:38:07,761 - finetune_scan_hardest_lr3_group_size16 - INFO - ---------------------------------------------------------------------------------------------------------- 2024-06-16 19:38:07,761 - finetune_scan_hardest_lr3_group_size16 - INFO - |module.blocks.layers.3.mixer.conv1d.weight |torch.float32 |(768, 1, 4) |3072 | 2024-06-16 19:38:07,761 - finetune_scan_hardest_lr3_group_size16 - INFO - ---------------------------------------------------------------------------------------------------------- 2024-06-16 19:38:07,761 - finetune_scan_hardest_lr3_group_size16 - INFO - |module.blocks.layers.3.mixer.conv1d.bias |torch.float32 |(768,) |768 | 2024-06-16 19:38:07,761 - finetune_scan_hardest_lr3_group_size16 - INFO - ---------------------------------------------------------------------------------------------------------- 2024-06-16 19:38:07,761 - finetune_scan_hardest_lr3_group_size16 - INFO - |module.blocks.layers.3.mixer.x_proj.weight |torch.float32 |(56, 768) |43008 | 2024-06-16 19:38:07,761 - finetune_scan_hardest_lr3_group_size16 - INFO - ---------------------------------------------------------------------------------------------------------- 2024-06-16 19:38:07,761 - finetune_scan_hardest_lr3_group_size16 - INFO - |module.blocks.layers.3.mixer.dt_proj.weight |torch.float32 |(768, 24) |18432 | 2024-06-16 19:38:07,761 - finetune_scan_hardest_lr3_group_size16 - INFO - ---------------------------------------------------------------------------------------------------------- 2024-06-16 19:38:07,761 - finetune_scan_hardest_lr3_group_size16 - INFO - |module.blocks.layers.3.mixer.dt_proj.bias |torch.float32 |(768,) |768 | 2024-06-16 19:38:07,761 - finetune_scan_hardest_lr3_group_size16 - INFO - ---------------------------------------------------------------------------------------------------------- 2024-06-16 19:38:07,761 - finetune_scan_hardest_lr3_group_size16 - INFO - |module.blocks.layers.3.mixer.out_proj.weight |torch.float32 |(384, 768) |294912 | 2024-06-16 19:38:07,762 - finetune_scan_hardest_lr3_group_size16 - INFO - ---------------------------------------------------------------------------------------------------------- 2024-06-16 19:38:07,762 - finetune_scan_hardest_lr3_group_size16 - INFO - |module.blocks.layers.3.norm.weight |torch.float32 |(384,) |384 | 2024-06-16 19:38:07,762 - finetune_scan_hardest_lr3_group_size16 - INFO - ---------------------------------------------------------------------------------------------------------- 2024-06-16 19:38:07,762 - finetune_scan_hardest_lr3_group_size16 - INFO - |module.blocks.layers.3.norm.bias |torch.float32 |(384,) |384 | 2024-06-16 19:38:07,762 - finetune_scan_hardest_lr3_group_size16 - INFO - ---------------------------------------------------------------------------------------------------------- 2024-06-16 19:38:07,762 - finetune_scan_hardest_lr3_group_size16 - INFO - |module.blocks.layers.4.mixer.A_log |torch.float32 |(768, 16) |12288 | 2024-06-16 19:38:07,762 - finetune_scan_hardest_lr3_group_size16 - INFO - ---------------------------------------------------------------------------------------------------------- 2024-06-16 19:38:07,762 - finetune_scan_hardest_lr3_group_size16 - INFO - |module.blocks.layers.4.mixer.D |torch.float32 |(768,) |768 | 2024-06-16 19:38:07,762 - finetune_scan_hardest_lr3_group_size16 - INFO - ---------------------------------------------------------------------------------------------------------- 2024-06-16 19:38:07,762 - finetune_scan_hardest_lr3_group_size16 - INFO - |module.blocks.layers.4.mixer.in_proj.weight |torch.float32 |(1536, 384) |589824 | 2024-06-16 19:38:07,762 - finetune_scan_hardest_lr3_group_size16 - INFO - ---------------------------------------------------------------------------------------------------------- 2024-06-16 19:38:07,762 - finetune_scan_hardest_lr3_group_size16 - INFO - |module.blocks.layers.4.mixer.conv1d.weight |torch.float32 |(768, 1, 4) |3072 | 2024-06-16 19:38:07,762 - finetune_scan_hardest_lr3_group_size16 - INFO - ---------------------------------------------------------------------------------------------------------- 2024-06-16 19:38:07,762 - finetune_scan_hardest_lr3_group_size16 - INFO - |module.blocks.layers.4.mixer.conv1d.bias |torch.float32 |(768,) |768 | 2024-06-16 19:38:07,762 - finetune_scan_hardest_lr3_group_size16 - INFO - ---------------------------------------------------------------------------------------------------------- 2024-06-16 19:38:07,762 - finetune_scan_hardest_lr3_group_size16 - INFO - |module.blocks.layers.4.mixer.x_proj.weight |torch.float32 |(56, 768) |43008 | 2024-06-16 19:38:07,762 - finetune_scan_hardest_lr3_group_size16 - INFO - ---------------------------------------------------------------------------------------------------------- 2024-06-16 19:38:07,762 - finetune_scan_hardest_lr3_group_size16 - INFO - |module.blocks.layers.4.mixer.dt_proj.weight |torch.float32 |(768, 24) |18432 | 2024-06-16 19:38:07,762 - finetune_scan_hardest_lr3_group_size16 - INFO - ---------------------------------------------------------------------------------------------------------- 2024-06-16 19:38:07,762 - finetune_scan_hardest_lr3_group_size16 - INFO - |module.blocks.layers.4.mixer.dt_proj.bias |torch.float32 |(768,) |768 | 2024-06-16 19:38:07,762 - finetune_scan_hardest_lr3_group_size16 - INFO - ---------------------------------------------------------------------------------------------------------- 2024-06-16 19:38:07,762 - finetune_scan_hardest_lr3_group_size16 - INFO - |module.blocks.layers.4.mixer.out_proj.weight |torch.float32 |(384, 768) |294912 | 2024-06-16 19:38:07,762 - finetune_scan_hardest_lr3_group_size16 - INFO - ---------------------------------------------------------------------------------------------------------- 2024-06-16 19:38:07,762 - finetune_scan_hardest_lr3_group_size16 - INFO - |module.blocks.layers.4.norm.weight |torch.float32 |(384,) |384 | 2024-06-16 19:38:07,762 - finetune_scan_hardest_lr3_group_size16 - INFO - ---------------------------------------------------------------------------------------------------------- 2024-06-16 19:38:07,762 - finetune_scan_hardest_lr3_group_size16 - INFO - |module.blocks.layers.4.norm.bias |torch.float32 |(384,) |384 | 2024-06-16 19:38:07,762 - finetune_scan_hardest_lr3_group_size16 - INFO - ---------------------------------------------------------------------------------------------------------- 2024-06-16 19:38:07,762 - finetune_scan_hardest_lr3_group_size16 - INFO - |module.blocks.layers.5.mixer.A_log |torch.float32 |(768, 16) |12288 | 2024-06-16 19:38:07,762 - finetune_scan_hardest_lr3_group_size16 - INFO - ---------------------------------------------------------------------------------------------------------- 2024-06-16 19:38:07,763 - finetune_scan_hardest_lr3_group_size16 - INFO - |module.blocks.layers.5.mixer.D |torch.float32 |(768,) |768 | 2024-06-16 19:38:07,763 - finetune_scan_hardest_lr3_group_size16 - INFO - ---------------------------------------------------------------------------------------------------------- 2024-06-16 19:38:07,763 - finetune_scan_hardest_lr3_group_size16 - INFO - |module.blocks.layers.5.mixer.in_proj.weight |torch.float32 |(1536, 384) |589824 | 2024-06-16 19:38:07,763 - finetune_scan_hardest_lr3_group_size16 - INFO - ---------------------------------------------------------------------------------------------------------- 2024-06-16 19:38:07,763 - finetune_scan_hardest_lr3_group_size16 - INFO - |module.blocks.layers.5.mixer.conv1d.weight |torch.float32 |(768, 1, 4) |3072 | 2024-06-16 19:38:07,763 - finetune_scan_hardest_lr3_group_size16 - INFO - ---------------------------------------------------------------------------------------------------------- 2024-06-16 19:38:07,763 - finetune_scan_hardest_lr3_group_size16 - INFO - |module.blocks.layers.5.mixer.conv1d.bias |torch.float32 |(768,) |768 | 2024-06-16 19:38:07,763 - finetune_scan_hardest_lr3_group_size16 - INFO - ---------------------------------------------------------------------------------------------------------- 2024-06-16 19:38:07,763 - finetune_scan_hardest_lr3_group_size16 - INFO - |module.blocks.layers.5.mixer.x_proj.weight |torch.float32 |(56, 768) |43008 | 2024-06-16 19:38:07,763 - finetune_scan_hardest_lr3_group_size16 - INFO - ---------------------------------------------------------------------------------------------------------- 2024-06-16 19:38:07,763 - finetune_scan_hardest_lr3_group_size16 - INFO - |module.blocks.layers.5.mixer.dt_proj.weight |torch.float32 |(768, 24) |18432 | 2024-06-16 19:38:07,763 - finetune_scan_hardest_lr3_group_size16 - INFO - ---------------------------------------------------------------------------------------------------------- 2024-06-16 19:38