How to setting the Multi-GPU for training?

jayleicn / TVRetrieval

[ECCV 2020] PyTorch code for XML on TVRetrieval dataset - TVR: A Large-Scale Dataset for Video-Subtitle Moment Retrieval

https://tvr.cs.unc.edu

MIT License

151 stars 24 forks source link

How to setting the Multi-GPU for training? #7

Closed minjoong507 closed 3 years ago

minjoong507 commented 3 years ago

Hi there.

I was trying to use multi-gpu for training. So I put the gpu ids in '--device_ids', baselines/crossmodal_moment_localization/config.py.

I fixed the code like below. if opt.train_span_start_epoch != -1 and epoch_i >= opt.train_span_start_epoch: model.set_train_st_ed(opt.lw_st_ed) -> model.set_train_st_ed(opt.lw_st_ed)

Then I added the following code in the front of my script os.environ["CUDA_DEVICE_ORDER"] = "PCI_BUS_ID" os.environ["CUDA_VISIBLE_DEVICES"] = "0,1,2,3,4,5"

But it still not working. What should I do?

I have used

6 GPUs : RTX 3090

Thanks you for your time.

jayleicn commented 3 years ago

Hi @minjoong507,

This code does not support multi-GPU training, and normally there is also no need to use it as well since the whole training only takes 4 hours for a single 2080Ti, using full precision. If you want to use multi-GPUs, please refer to PyTorch's official tutorials.

Best, Jie

minjoong507 commented 3 years ago

Thank you for quick reply!

Is there any difference in performance between using multi-GPU and single-GPU?

Recently, I saw the issue about performance on HERO model. (link: https://github.com/linjieli222/HERO/issues/14)

Thank you for your answering.

jayleicn commented 3 years ago

Hi @minjoong507,

Most likely yes, but it depends on how you implement it. For example, if hard negatives are mined across GPUs as in https://github.com/linjieli222/HERO/issues/14, it is expected to see some improvement as better negatives are used (mined from a larger mini-batch). Meanwhile, as 3090 has a much larger memory than 2080Ti, you can increase the mini-batch size to get better negatives and thus some improvement.

Jie

minjoong507 commented 3 years ago

I got it! Thanks :)