training with multiple GPU

liyues / PatRecon

Patient-specific reconstruction of volumetric computed tomography images from few-view projections via deep learning

105 stars 35 forks source link

training with multiple GPU #6

Open gokceay opened 3 years ago

gokceay commented 3 years ago

Hello Mrs. Liyues,

I would like to use more than one GPU, how can I achieve this? In trainer net at line 25: self.model = nn.DataParallel(self.model).cuda(), I should add gpu ids inside DataParallel besides self.model right? Should I also increase the batch size to be more than the used GPU? Also did you tried batch size more than 1? What was the result if you tried? Thanks in advance.

yinyin-llll commented 1 year ago

hi，i also want to retrain the model but it need the csv file with annotation.Do you know this file or how to design this file?Thank you very much

quocbao2772004 commented 1 month ago

@yinyin-llll hi bro, have you successfully retrained the model?

yinyin-llll commented 1 month ago

CR7.xxl @.***

------------------ 原始邮件 ------------------ 发件人: "Lê Trần Quốc @.>; 发送时间: 2024年9月2日(星期一) 上午10:45 收件人: @.>; 抄送: @.>; @.>; 主题: Re: [liyues/PatRecon] training with multiple GPU (#6)

@yinyin-llll hi bro, have you successfully retrained the model? — Reply to this email directly, view it on GitHub, or unsubscribe. You are receiving this because you were mentioned.Message ID: @.***>

liyues commented 1 month ago

Hi all, thanks for your interests in this work and code. This codebase has been quite a few years ago with many changes happened later. I will try to recall my memory to answer these questions to my best but sorry if there is anything unclear. For the csv file, that should be just a file to store the path to the data file names where you store your data. So you may want to change this file to the one that can direct to your data path so the images can be loaded. Hope this can be helpful and let me know if there is any question.

liyues commented 1 month ago

For the multi-GPU training, I think you are right it is possible to assign different batch of data onto different GPUs so that you can increase your total training batch size. That is to say, this is data parallel training. The PyTorch version may be too old now and you may want to check the new version for the data parallel training in the new PyTorch version. Hope this can be helpful