About SOT training speed and GPU-Util

megvii-research / video_analyst

A series of basic algorithms that are useful for video understanding, including Single Object Tracking (SOT), Video Object Segmentation (VOS) and so on.

MIT License

828 stars 176 forks source link

About SOT training speed and GPU-Util #215

Open cuikf opened 2 years ago

cuikf commented 2 years ago

When I use python ./main/train.py --config 'experiments/siamfcpp/train/lasot/siamfcpp_alexnet-trn.yaml', the GPU-Util always jumps from 89% seconds later to 0%, then 89% again.

in the .yaml: num_processes: 2 minibatch: &MINIBATCH 128 num_workers: 64

I'm very confused

MARMOTatZJU commented 2 years ago

@cuikf This is highly probably due to the bottleneck at data providing stage. The ability of data provider is usually due to the CPU and memory of the training machine (that's why training under large batch size often requires high-performance machine). You can try to reduce the batch size to ease this issue.

cuikf commented 2 years ago

@MARMOTatZJU Thank u！I'll try it!