tinyvision / SOLIDER-REID

MIT License
67 stars 12 forks source link

在MSMT17数据集上运行test.py出现内存溢出 #19

Open emlssyj opened 6 months ago

emlssyj commented 6 months ago

2024-05-11 09:36:30 transreid INFO: Namespace(config_file='configs/msmt17/swin_small.yml', opts=['TEST.WEIGHT', './log/msmt17/swin_small/transformer_120.pth', 'TEST.RE_RANKING', 'True', 'MODEL.SEMANTIC_WEIGHT', '0.2']) 2024-05-11 09:36:30 transreid INFO: Loaded configuration file configs/msmt17/swin_small.yml 2024-05-11 09:36:30 transreid INFO: MODEL: PRETRAIN_HW_RATIO: 2 METRIC_LOSS_TYPE: 'triplet' IF_LABELSMOOTH: 'off' IF_WITH_CENTER: 'no' NAME: 'transformer' NO_MARGIN: True DEVICE_ID: ('0') TRANSFORMER_TYPE: 'swin_small_patch4_window7_224' STRIDE_SIZE: [16, 16]

INPUT: SIZE_TRAIN: [384, 128] SIZE_TEST: [384, 128] PROB: 0.5 # random horizontal flip RE_PROB: 0.5 # random erasing PADDING: 10 PIXEL_MEAN: [0.5, 0.5, 0.5] PIXEL_STD: [0.5, 0.5, 0.5]

DATASETS: NAMES: ('msmt17') ROOT_DIR: ('/TransReID/data')

DATALOADER: SAMPLER: 'softmax_triplet' NUM_INSTANCE: 4 NUM_WORKERS: 8

SOLVER: OPTIMIZER_NAME: 'SGD' MAX_EPOCHS: 120 BASE_LR: 0.0008 WARMUP_EPOCHS: 20 IMS_PER_BATCH: 64 WARMUP_METHOD: 'cosine' LARGE_FC_LR: False CHECKPOINT_PERIOD: 120 LOG_PERIOD: 20 EVAL_PERIOD: 10 WEIGHT_DECAY: 1e-4 WEIGHT_DECAY_BIAS: 1e-4 BIAS_LR_FACTOR: 2

TEST: EVAL: True IMS_PER_BATCH: 256 RE_RANKING: False WEIGHT: '' NECK_FEAT: 'before' FEAT_NORM: 'yes'

OUTPUT_DIR: './log/msmt17/swin_small'

2024-05-11 09:36:30 transreid INFO: Running with config: DATALOADER: NUM_INSTANCE: 4 NUM_WORKERS: 8 REMOVE_TAIL: 0 SAMPLER: softmax_triplet DATASETS: NAMES: msmt17 ROOT_DIR: /media/lab/Disk1/TransReID/data ROOT_TRAIN_DIR: ../data ROOT_VAL_DIR: ../data INPUT: PADDING: 10 PIXEL_MEAN: [0.5, 0.5, 0.5] PIXEL_STD: [0.5, 0.5, 0.5] PROB: 0.5 RE_PROB: 0.5 SIZE_TEST: [384, 128] SIZE_TRAIN: [384, 128] MODEL: ATT_DROP_RATE: 0.0 COS_LAYER: False DEVICE: cuda DEVICE_ID: 0 DEVIDE_LENGTH: 4 DIST_TRAIN: False DROPOUT_RATE: 0.0 DROP_OUT: 0.0 DROP_PATH: 0.1 FEAT_DIM: 512 GEM_POOLING: False ID_LOSS_TYPE: softmax ID_LOSS_WEIGHT: 1.0 IF_LABELSMOOTH: off IF_WITH_CENTER: no JPM: False LAST_STRIDE: 1 METRIC_LOSS_TYPE: triplet NAME: transformer NECK: bnneck NO_MARGIN: True PRETRAIN_CHOICE: imagenet PRETRAIN_HW_RATIO: 2 PRETRAIN_PATH: REDUCE_FEAT_DIM: False RE_ARRANGE: True SEMANTIC_WEIGHT: 0.2 SHIFT_NUM: 5 SHUFFLE_GROUP: 2 SIE_CAMERA: False SIE_COE: 3.0 SIE_VIEW: False STEM_CONV: False STRIDE_SIZE: [16, 16] TRANSFORMER_TYPE: swin_small_patch4_window7_224 TRIPLET_LOSS_WEIGHT: 1.0 OUTPUT_DIR: ./log/msmt17/swin_small SOLVER: BASE_LR: 0.0008 BIAS_LR_FACTOR: 2 CENTER_LOSS_WEIGHT: 0.0005 CENTER_LR: 0.5 CHECKPOINT_PERIOD: 120 COSINE_MARGIN: 0.5 COSINE_SCALE: 30 EVAL_PERIOD: 10 GAMMA: 0.1 IMS_PER_BATCH: 64 LARGE_FC_LR: False LOG_PERIOD: 20 MARGIN: 0.3 MAX_EPOCHS: 120 MOMENTUM: 0.9 OPTIMIZER_NAME: SGD SEED: 1234 STEPS: (40, 70) TRP_L2: False WARMUP_EPOCHS: 20 WARMUP_FACTOR: 0.01 WARMUP_METHOD: cosine WEIGHT_DECAY: 0.0001 WEIGHT_DECAY_BIAS: 0.0001 TEST: DIST_MAT: dist_mat.npy EVAL: True FEAT_NORM: yes IMS_PER_BATCH: 256 NECK_FEAT: before RE_RANKING: True WEIGHT: ./log/msmt17/swin_small/transformer_120.pth {1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15} cam_container {1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15} cam_container {1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15} cam_container {1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15} cam_container => MSMT17 loaded 2024-05-11 09:36:31 transreid.check INFO: Dataset statistics: 2024-05-11 09:36:31 transreid.check INFO: ---------------------------------------- 2024-05-11 09:36:31 transreid.check INFO: subset | # ids | # images | # cameras 2024-05-11 09:36:31 transreid.check INFO: ---------------------------------------- 2024-05-11 09:36:31 transreid.check INFO: train | 1041 | 32621 | 15 2024-05-11 09:36:31 transreid.check INFO: query | 3060 | 11659 | 15 2024-05-11 09:36:31 transreid.check INFO: gallery | 3060 | 82161 | 15 2024-05-11 09:36:31 transreid.check INFO: ---------------------------------------- using img_triplet sampler using Transformer_type: swin_small_patch4_window7_224 as a backbone /media/lab/Disk1/SOLIDER-REID/model/backbones/swin_transformer.py:1159: UserWarning: DeprecationWarning: pretrained is deprecated, please use "init_cfg" instead warnings.warn('DeprecationWarning: pretrained is deprecated, ' ===========building transformer=========== Loading pretrained model from ./log/msmt17/swin_small/transformer120.pth 2024-05-11 09:36:37 transreid.test INFO: Enter inferencing The test feature is normalized => Enter reranking /media/lab/Disk1/SOLIDER-REID/utils/reranking.py:40: UserWarning: This overload of addmm is deprecated: addmm(Number beta, Number alpha, Tensor mat1, Tensor mat2) Consider using one of the following signatures instead: addmm(Tensor mat1, Tensor mat2, *, Number beta, Number alpha) (Triggered internally at ../torch/csrc/utils/python_argparser.cpp:1630.) distmat.addmm(1, -2, feat, feat.t()) Killed

可能存在内存泄露导致内存溢出被Killed,实验环境为128G内存,请指教是哪里出问题了?感谢!