simon-ging / coot-videotext

COOT: Cooperative Hierarchical Transformer for Video-Text Representation Learning
Apache License 2.0
288 stars 55 forks source link

[BUG] multi gpu training without --single_gpu #19

Open menatallh opened 3 years ago

menatallh commented 3 years ago

Describe the bug Problem with multi gpu training when i remove --single gpu

Expected behavior it detects the available gpus image

Screenshots image

System Info:

Additional context Add any other context about the problem here.

simon-ging commented 3 years ago

If you have solved it, please consider posting your fix for others.

menggehe commented 3 years ago

Did you solve this problem?

simon-ging commented 3 years ago

Does it still happen? If yes please post a complete bug report: Which command do you input, the complete error message, output of system command "nvidia-smi", which system / python / pytorch version. Then I will look into it.

menggehe commented 3 years ago

command : image

message: image image

output of system command "nvidia-smi": image

System Info: OS: Ubuntu 18.04 Python version 3.8.5 PyTorch version 1.8.1

menggehe commented 3 years ago

I change some code in utils_torch.py: 1. before: image after: image

But the model still uses only one GPU device:0.

simon-ging commented 3 years ago

I will check this problem, it should be possible to train on multiple GPUs. Other than that, unless you increase the model size or batch size, a single 12GB GPU is more than enough to train retrieval