seoulsky-field / CXRAIL-dev

CXRAIL-dev
MIT License
7 stars 0 forks source link

Partially resolved: Feature/#54 #59

Closed kdg1993 closed 1 year ago

kdg1993 commented 1 year ago

Motivation 🤔

Soon, we need to use multi-GPU for experiments. Also, it is not sure raytune automatically supports parallel GPU usage. Thus, it is needed to implement multi-GPU support and test if it is well functioning. (Detailed description about the motivation and implementation direction, please see #54 )

Key Changes 🔑

To Reviewers 🙏

resolves: #54 references:

seoulsky-field commented 1 year ago

Looks good to me! I think it would be helpful as soon as possible! 👍 By the way, if you use a device code like the changed, could you change the codes from with torch.autocast(device_type=str(device).split(":")[0]): to with torch.autocast(device_type=str(device)):? When I implemented, because of the output "cuda:0", I made the codes like that!

kdg1993 commented 1 year ago

Thanks for your quick response and sharp questions 👍

  1. I am sorry for forgetting to add the tested result link 😭 (https://wandb.ai/snuh_interns/multi_gpu_test?workspace=user-snuh_interns)
    It works well (at least 4 settings all show distributed assigning to GPUs) for all settings

  2. It automatically adapts the number of GPUs. For example, if there is only one GPU, it is the same as not using If there are multi-GPU but want to use only one or some of them, nn.DataParallel supports the selection of choices. However, it needs more analysis because of raytune's resources assignment. Thus, it takes more time to implement the selection option

The tested command of the tested result in answer no.1 is below

clear

Single run wo/ ray

python main.py \ Dataset.train_size=3000 \ use_amp=False \ project_name='multi_gpu_test' \ logging.setup.name='single_wo_ray' \ hparams_search=none

Single run w/ ray

python main.py \ Dataset.train_size=3000 \ use_amp=False \ project_name='multi_gpu_test' \ logging.setup.name='single_w_ray' \ hparams_search=raytune

Multi run w/o ray

python main.py --multirun \ Dataset=CheXpert,MIMIC \ Dataset.train_size=3000 \ use_amp=False \ project_name='multi_gpu_test' \ logging.setup.name='multi_wo_ray' \ hparams_search=none

Multi run w/ ray

python main.py --multirun \ Dataset=CheXpert,MIMIC \ Dataset.train_size=3000 \ use_amp=False \ project_name='multi_gpu_test' \ logging.setup.name='multi_w_ray' \ hparams_search=raytune

kdg1993 commented 1 year ago

Thanks, @seoulsky-field what I could not think to change!

I think I understand due to the precise description of yours & I will change it without PR