tianbaochou / NasUnet

170 stars 45 forks source link

多卡训练和搜索均会卡死 #19

Closed jackson08 closed 4 years ago

jackson08 commented 5 years ago

您好,我下载了您的代码,Nasunet在Promise12数据集上单卡均正常,而使用多卡时就会在0%处卡死。 我使用三张1060 设置如下: model: arch: nasunet data: dataset: promise12 train_split: train_aug split: val img_rows: 'same' img_cols: 'same' searching: init_channels: 16 depth: 4 epoch: 300 batch_size: 6 report_freq: 20 n_workers: 2 alpha_begin: 10 max_patience: 40 gpu: True multi_gpus: True sharing_normal: True double_down_channel: False meta_node_num: 3 grad_clip: 5 train_portion: 0.5 model_optimizer: name: 'sgd' lr: 0.025 weight_decay: 3.0e-4 arch_optimizer: name: 'adam' lr: 3.0e-4 weight_decay: 5.0e-3 loss: name: 'dice_loss' size_average: False aux_weight: resume: training: geno_type: NASUNET init_channels: 32 depth: 5 epoch: 200 batch_size: 6 report_freq: 10 n_workers: 2 multi_gpus: True double_down_channel: False grad_clip: 3 max_patience: 100 model_optimizer: name: 'adam' lr: 1.0e-3 weight_decay: 5.0e-4 loss: name: 'dice_loss' aux_weight: 0 backbone: lr_schedule: name: 'cos' T_max: 100 resume:

jackson08 commented 5 years ago

在6700K+1080ti2下正常 但是在皓龙62122 + 1060*3 就会卡主 玄学问题

tianbaochou commented 5 years ago

在6700K+1080ti_2下正常 但是在皓龙6212_2 + 1060*3 就会卡主 玄学问题

我从来没有遇到这个问题哈。可能是dataloader那边有问题