Closed kdg1993 closed 1 year ago
Looks good to me! I think it would be helpful as soon as possible! 👍
By the way, if you use a device code like the changed, could you change the codes from with torch.autocast(device_type=str(device).split(":")[0]):
to with torch.autocast(device_type=str(device)):
?
When I implemented, because of the output "cuda:0", I made the codes like that!
Thanks for your quick response and sharp questions 👍
I am sorry for forgetting to add the tested result link 😭 (https://wandb.ai/snuh_interns/multi_gpu_test?workspace=user-snuh_interns)
It works well (at least 4 settings all show distributed assigning to GPUs) for all settings
It automatically adapts the number of GPUs. For example, if there is only one GPU, it is the same as not using If there are multi-GPU but want to use only one or some of them, nn.DataParallel supports the selection of choices. However, it needs more analysis because of raytune's resources assignment. Thus, it takes more time to implement the selection option
The tested command of the tested result in answer no.1 is below
clear
Single run wo/ ray
python main.py \ Dataset.train_size=3000 \ use_amp=False \ project_name='multi_gpu_test' \ logging.setup.name='single_wo_ray' \ hparams_search=none
Single run w/ ray
python main.py \ Dataset.train_size=3000 \ use_amp=False \ project_name='multi_gpu_test' \ logging.setup.name='single_w_ray' \ hparams_search=raytune
Multi run w/o ray
python main.py --multirun \ Dataset=CheXpert,MIMIC \ Dataset.train_size=3000 \ use_amp=False \ project_name='multi_gpu_test' \ logging.setup.name='multi_wo_ray' \ hparams_search=none
Multi run w/ ray
python main.py --multirun \ Dataset=CheXpert,MIMIC \ Dataset.train_size=3000 \ use_amp=False \ project_name='multi_gpu_test' \ logging.setup.name='multi_w_ray' \ hparams_search=raytune
Thanks, @seoulsky-field what I could not think to change!
I think I understand due to the precise description of yours & I will change it without PR
Motivation 🤔
Soon, we need to use multi-GPU for experiments. Also, it is not sure raytune automatically supports parallel GPU usage. Thus, it is needed to implement multi-GPU support and test if it is well functioning. (Detailed description about the motivation and implementation direction, please see #54 )
Key Changes 🔑
To Reviewers 🙏
resolves: #54 references: