When i use four gpu to train this model, one the first gpu, Memory Usage is about 10761MB/12196M, while the other GPU all 3651M/12196M, which waste too much memory usage. Why do not you use GPU balance function (like DataParallelModel in Integral LOSS)? or can you account for the same situation?
When i use four gpu to train this model, one the first gpu, Memory Usage is about 10761MB/12196M, while the other GPU all 3651M/12196M, which waste too much memory usage. Why do not you use GPU balance function (like DataParallelModel in Integral LOSS)? or can you account for the same situation?