out of memory when training or testing

Jessespace commented 6 years ago

I have changed test_segments and test_crops to 2,but I still get the error "out of memory".Hope you can help me,thx!

Hardware Configuration: GPU: GTX1080TI 11G CPU: 16G

`Initializing TSN with base model: BNInception. TSN Configurations: input_modality: RGB num_segments: 2 new_length: 1 consensus_module: TRNmultiscale dropout_ratio: 0.8 img_feature_dim: 256

/home/jesse/anaconda2/lib/python2.7/site-packages/torch/nn/modules/module.py:482: UserWarning: src is not broadcastable to dst, but they have the same number of elements. Falling back to deprecated pointwise behavior. ownstate[name].copy(param) ('Multi-Scale Temporal Relation Network Module in use', ['2-frame relation']) THCudaCheck FAIL file=/opt/conda/conda-bld/pytorch_1513363039688/work/torch/lib/THC/generic/THCStorage.c line=82 error=2 : out of memory ./test_rgb_something.sh: 行 2: 16617 段错误 (核心已转储) python test_models.py something RGB model/TRN_something_RGB_BNInception_TRNmultiscale_segment8_best.pth.tar --arch BNInception --crop_fusion_type TRNmultiscale --test_segments 2 --test_crops 2 `

valeriechen commented 6 years ago

I ran out of memory when training but fixed the problem with a smaller batch size.

alexandonian commented 6 years ago

Hi @Jessespace,

As @valeriechen pointed out, reducing your batch size should fix issues with memory during training. For reference, we typically train these models using several GPUs, up to 6 at times.

Since during testing in test_models.py, the batch size is one. I would recommend setting test_crops to 1. If you are using one of the pretrained models we released (as with the code you provided), test_segments should still be 8 to match the number of segments in the pretrained model.

Hope this helps! Alex

dukebrah commented 5 years ago

Thanks @alexandonian !

following up on setting test_crops to 1 in the test_video script

change:

transforms.GroupOverSample(net.input_size, net.scale_size),

to

transforms.GroupScale(net.scale_size), transforms.GroupCenterCrop(net.input_size),

zhoubolei / TRN-pytorch

out of memory when training or testing #8