Open Jessespace opened 6 years ago
I ran out of memory when training but fixed the problem with a smaller batch size.
Hi @Jessespace,
As @valeriechen pointed out, reducing your batch size should fix issues with memory during training. For reference, we typically train these models using several GPUs, up to 6 at times.
Since during testing in test_models.py
, the batch size is one. I would recommend setting test_crops
to 1. If you are using one of the pretrained models we released (as with the code you provided), test_segments
should still be 8 to match the number of segments in the pretrained model.
Hope this helps! Alex
Thanks @alexandonian !
following up on setting test_crops to 1 in the test_video script
change:
transforms.GroupOverSample(net.input_size, net.scale_size),
to
transforms.GroupScale(net.scale_size), transforms.GroupCenterCrop(net.input_size),
I have changed test_segments and test_crops to 2,but I still get the error "out of memory".Hope you can help me,thx!
Hardware Configuration: GPU: GTX1080TI 11G CPU: 16G
`Initializing TSN with base model: BNInception. TSN Configurations: input_modality: RGB num_segments: 2 new_length: 1 consensus_module: TRNmultiscale dropout_ratio: 0.8 img_feature_dim: 256
/home/jesse/anaconda2/lib/python2.7/site-packages/torch/nn/modules/module.py:482: UserWarning: src is not broadcastable to dst, but they have the same number of elements. Falling back to deprecated pointwise behavior. ownstate[name].copy(param) ('Multi-Scale Temporal Relation Network Module in use', ['2-frame relation']) THCudaCheck FAIL file=/opt/conda/conda-bld/pytorch_1513363039688/work/torch/lib/THC/generic/THCStorage.c line=82 error=2 : out of memory ./test_rgb_something.sh: 行 2: 16617 段错误 (核心已转储) python test_models.py something RGB model/TRN_something_RGB_BNInception_TRNmultiscale_segment8_best.pth.tar --arch BNInception --crop_fusion_type TRNmultiscale --test_segments 2 --test_crops 2 `