tensorflow / tensor2tensor

Library of deep learning models and datasets designed to make deep learning more accessible and accelerate ML research.
Apache License 2.0
15.5k stars 3.49k forks source link

tensor2tensor breaks GPU limit in a container environment #1012

Closed EdwardZhang88 closed 6 years ago

EdwardZhang88 commented 6 years ago

Description

I am running tensor2tensor in a Kubernetes container environment. I find that no matter how many GPU I allocate to the container, tensor2tensor will always be able to use up all GPU on the node. For example, when I assign only one GPU to the container and also 1 to the --worker_gpu in my training script, all 4 GPU on the node will be visible and the transformer model variables will fill up the memory of all 4 GPU. What is tricky is that the actual computation will only happen on one GPU though. I think it's most likely to be an issue with tensor2tensor as the GPU limit does get honored if I switch to a plain tensorflow container.

Below is the training script I submitted, python -u t2t-trainer --model=transformer --hparams_set=$HPARAMS --problems=$PROBLEM --t2t_usr_dir=$t2t_usr_dir --data_dir=$data_dir --output_dir=$model_dir --save_checkpoints_secs 1800 --train_steps 45000 --worker_gpu=1

Below is the output of nvidia-smi command on the GPU node when the tensor2tensor container is running, +-----------------------------------------------------------------------------+ | NVIDIA-SMI 381.04 Driver Version: 381.04 | |-------------------------------+----------------------+----------------------+ | GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC | | Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. | |===============================+======================+======================| | 0 Tesla K40m Off | 0000:03:00.0 Off | 0 | | N/A 54C P0 148W / 235W | 10999MiB / 11439MiB | 99% Default | +-------------------------------+----------------------+----------------------+ | 1 Tesla K40m Off | 0000:04:00.0 Off | 0 | | N/A 35C P0 63W / 235W | 10953MiB / 11439MiB | 0% Default | +-------------------------------+----------------------+----------------------+ | 2 Tesla K40m Off | 0000:82:00.0 Off | 0 | | N/A 37C P0 61W / 235W | 10953MiB / 11439MiB | 0% Default | +-------------------------------+----------------------+----------------------+ | 3 Tesla K40m Off | 0000:83:00.0 Off | 0 | | N/A 37C P0 63W / 235W | 10953MiB / 11439MiB | 0% Default | +-------------------------------+----------------------+----------------------+

+-----------------------------------------------------------------------------+ | Processes: GPU Memory | | GPU PID Type Process name Usage | |=============================================================================| | 0 71703 C python 10982MiB | | 1 71703 C python 10936MiB | | 2 71703 C python 10936MiB | | 3 71703 C python 10936MiB | +-----------------------------------------------------------------------------+

Environment information

tensor2tensor v1.2.9

OS: CentOS 7.2
Container OS: Ubuntu 16.04

$ pip freeze | grep tensor
tensorboard==1.9.0
tensorflow-gpu==1.4.1
tensorflow-tensorboard==0.4.0

$ python -V
Python 2.7.12
EdwardZhang88 commented 6 years ago

I have upgraded tensor2tensor and tensorflow to v1.8 and v1.10 respectively, but the same problem still persists. I am wondering if anybody else has run into the same issue. This is easily causing CUDA out_of_memory issue and I just don't know what a better option is other than allocating all 4 GPU to tensor2tensor at a time.

EdwardZhang88 commented 6 years ago

Finally figure it out by updating the gpu_options in t2t-trainer.py, e.g., config.gpu_options.visible_device_list=str(my_rank) Apparently, this is a better way than using CUDA_VISIBLE_DEVICES as the latter will usually prohibit CUDA IPC.