Training related questions

nadeemlab / CIR

Clinically-Interpretable Radiomics [MICCAI'22, CMPB'21]

27 stars 6 forks source link

Training related questions #14

Closed kingjames1155 closed 2 years ago

kingjames1155 commented 2 years ago

I used 3090 (24GB) in training. I found that the video memory only used 7GB, and the CPU usage was very low. It took me about 10 minutes to train an epoch. I read your paper and found that you used four A6000. How do you improve the video memory and CPU usage to improve the training speed? In addition, I would like to ask how to load the previously trained model to continue training

taznux commented 2 years ago

That's the limitation of voxel2mesh, and it does not support multi batch because of the mesh decoding. It would be available to implement if you can expect maximum mesh size after applying adaptive unpooling. It was difficult and inefficient so I ran multiple trainings with different parameters on a GPU.

On Sat, Oct 8, 2022 at 6:52 PM Saad Nadeem @.***> wrote:

Assigned #14 https://github.com/nadeemlab/CIR/issues/14 to @taznux https://github.com/taznux.

— Reply to this email directly, view it on GitHub https://github.com/nadeemlab/CIR/issues/14#event-7548591065, or unsubscribe https://github.com/notifications/unsubscribe-auth/ACETWPIMKJA7VDLBM327TVTWCH3MPANCNFSM6AAAAAARABTGMQ . You are receiving this because you were assigned.Message ID: @.***>

kingjames1155 commented 2 years ago

这就是voxel2mesh的局限，因为mesh解码不支持multi-batch。如果您在应用自适应取消池后可以预期最大网格大小，则可以实现。这既困难又低效，因此我在 GPU 上使用不同参数进行了多次训练。 … 2022 年 10 月 8 日星期六下午 6:52 Saad Nadeem @.> 写道：将#14 < #14 > 分配给@taznux < https://github.com/taznux >。— 直接回复此邮件，在 GitHub 上查看< #14（评论） >，或取消订阅< https://github.com/notifications/unsubscribe-auth/ACETWPIMKJA7VDLBM327TVTWCH3MPANCNFSM6AAAAAARABTGMQ >。您收到此邮件是因为您已被分配。邮件 ID：@.>

Thank you for your timely reply, In addition, I would like to ask how to load the previously trained model to continue training,I can't find the corresponding settings in main.py

taznux commented 2 years ago

If you use the same experiment ID cfg.experiment_idx and trial ID cfg.trial_id for the pre-trained model in the 'config.py,' you can start training from the pre-trained model. The trial_id is set to None by default to generate a unique ID whenever start training.

Mesh Only model is available here

def load_config():

    cfg = Config()
    ''' Experiment '''
    cfg.experiment_idx = 1 
    cfg.trial_id = 1

Mesh+Encoder model is available here

def load_config():

    cfg = Config()
    ''' Experiment '''
    cfg.experiment_idx = 2 
    cfg.trial_id = 1

Then, run python main.py.