Open danirisdiandita opened 3 years ago
with specification below
torch==1.4.0 torchvision==0.5.0 pycocotools==2.0.0
I run
python setup.py install in the working directory
and run
python train_dist.py --dataset coco --model deeplab --backbone resnest50 --aux
having logs like below
3 Namespace(aux=True, aux_weight=0.2, backbone='resnest50', base_size=520, batch_size=2, checkname='default', crop_size=480, dataset='coco', dist_backend='nccl', dist_url='tcp://localhost:12321', epochs=30, eval=False, export=None, ft=False, lr=0.0005, lr_scheduler='poly', model='deeplab', model_zoo=None, momentum=0.9, rank=0, rectify=False, rectify_avg=False, resume=None, se_loss=False, se_weight=0.2, seed=1, start_epoch=0, test_batch_size=16, test_folder=None, test_val=False, train_split='train', weight_decay=0.0001, workers=8, world_size=1) 3 rank: 0 / 3 BaseDataset: base_size 520, crop_size 480 train set loading annotations into memory... Done (t=27.11s) creating index... index created! val set loading annotations into memory... Done (t=2.57s) creating index... index created! DeepLabV3( (pretrained): ResNet( (conv1): Sequential( (0): Conv2d(3, 32, kernel_size=(3, 3), stride=(2, 2), padding=(1, 1), bias=False) (1): DistSyncBatchNorm(32, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True) (2): ReLU(inplace=True) (3): Conv2d(32, 32, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False) (4): DistSyncBatchNorm(32, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True) (5): ReLU(inplace=True) (6): Conv2d(32, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False) ) (bn1): DistSyncBatchNorm(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True) (relu): ReLU(inplace=True) (maxpool): MaxPool2d(kernel_size=3, stride=2, padding=1, dilation=1, ceil_mode=False) (layer1): Sequential( (0): Bottleneck( (conv1): Conv2d(64, 64, kernel_size=(1, 1), stride=(1, 1), bias=False) (bn1): DistSyncBatchNorm(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True) (conv2): SplAtConv2d( (conv): Conv2d(64, 128, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), groups=2, bias=False) (bn0): DistSyncBatchNorm(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True) (relu): ReLU(inplace=True) (fc1): Conv2d(64, 32, kernel_size=(1, 1), stride=(1, 1)) (bn1): DistSyncBatchNorm(32, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True) (fc2): Conv2d(32, 128, kernel_size=(1, 1), stride=(1, 1)) (rsoftmax): rSoftMax() ) (conv3): Conv2d(64, 256, kernel_size=(1, 1), stride=(1, 1), bias=False) (bn3): DistSyncBatchNorm(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True) (relu): ReLU(inplace=True) (downsample): Sequential( (0): AvgPool2d(kernel_size=1, stride=1, padding=0) (1): Conv2d(64, 256, kernel_size=(1, 1), stride=(1, 1), bias=False) (2): DistSyncBatchNorm(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True) ) ) (1): Bottleneck( (conv1): Conv2d(256, 64, kernel_size=(1, 1), stride=(1, 1), bias=False) (bn1): DistSyncBatchNorm(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True) (conv2): SplAtConv2d( (conv): Conv2d(64, 128, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), groups=2, bias=False) (bn0): DistSyncBatchNorm(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True) (relu): ReLU(inplace=True) (fc1): Conv2d(64, 32, kernel_size=(1, 1), stride=(1, 1)) (bn1): DistSyncBatchNorm(32, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True) (fc2): Conv2d(32, 128, kernel_size=(1, 1), stride=(1, 1)) (rsoftmax): rSoftMax() ) (conv3): Conv2d(64, 256, kernel_size=(1, 1), stride=(1, 1), bias=False) (bn3): DistSyncBatchNorm(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True) (relu): ReLU(inplace=True) ) (2): Bottleneck( (conv1): Conv2d(256, 64, kernel_size=(1, 1), stride=(1, 1), bias=False) (bn1): DistSyncBatchNorm(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True) (conv2): SplAtConv2d( (conv): Conv2d(64, 128, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), groups=2, bias=False) (bn0): DistSyncBatchNorm(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True) (relu): ReLU(inplace=True) (fc1): Conv2d(64, 32, kernel_size=(1, 1), stride=(1, 1)) (bn1): DistSyncBatchNorm(32, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True) (fc2): Conv2d(32, 128, kernel_size=(1, 1), stride=(1, 1)) (rsoftmax): rSoftMax() ) (conv3): Conv2d(64, 256, kernel_size=(1, 1), stride=(1, 1), bias=False) (bn3): DistSyncBatchNorm(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True) (relu): ReLU(inplace=True) ) ) (layer2): Sequential( (0): Bottleneck( (conv1): Conv2d(256, 128, kernel_size=(1, 1), stride=(1, 1), bias=False) (bn1): DistSyncBatchNorm(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True) (avd_layer): AvgPool2d(kernel_size=3, stride=2, padding=1) (conv2): SplAtConv2d( (conv): Conv2d(128, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), groups=2, bias=False) (bn0): DistSyncBatchNorm(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True) (relu): ReLU(inplace=True) (fc1): Conv2d(128, 64, kernel_size=(1, 1), stride=(1, 1)) (bn1): DistSyncBatchNorm(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True) (fc2): Conv2d(64, 256, kernel_size=(1, 1), stride=(1, 1)) (rsoftmax): rSoftMax() ) (conv3): Conv2d(128, 512, kernel_size=(1, 1), stride=(1, 1), bias=False) (bn3): DistSyncBatchNorm(512, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True) (relu): ReLU(inplace=True) (downsample): Sequential( (0): AvgPool2d(kernel_size=2, stride=2, padding=0) (1): Conv2d(256, 512, kernel_size=(1, 1), stride=(1, 1), bias=False) (2): DistSyncBatchNorm(512, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True) ) ) (1): Bottleneck( (conv1): Conv2d(512, 128, kernel_size=(1, 1), stride=(1, 1), bias=False) (bn1): DistSyncBatchNorm(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True) (conv2): SplAtConv2d( (conv): Conv2d(128, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), groups=2, bias=False) (bn0): DistSyncBatchNorm(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True) (relu): ReLU(inplace=True) (fc1): Conv2d(128, 64, kernel_size=(1, 1), stride=(1, 1)) (bn1): DistSyncBatchNorm(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True) (fc2): Conv2d(64, 256, kernel_size=(1, 1), stride=(1, 1)) (rsoftmax): rSoftMax() ) (conv3): Conv2d(128, 512, kernel_size=(1, 1), stride=(1, 1), bias=False) (bn3): DistSyncBatchNorm(512, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True) (relu): ReLU(inplace=True) ) (2): Bottleneck( (conv1): Conv2d(512, 128, kernel_size=(1, 1), stride=(1, 1), bias=False) (bn1): DistSyncBatchNorm(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True) (conv2): SplAtConv2d( (conv): Conv2d(128, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), groups=2, bias=False) (bn0): DistSyncBatchNorm(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True) (relu): ReLU(inplace=True) (fc1): Conv2d(128, 64, kernel_size=(1, 1), stride=(1, 1)) (bn1): DistSyncBatchNorm(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True) (fc2): Conv2d(64, 256, kernel_size=(1, 1), stride=(1, 1)) (rsoftmax): rSoftMax() ) (conv3): Conv2d(128, 512, kernel_size=(1, 1), stride=(1, 1), bias=False) (bn3): DistSyncBatchNorm(512, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True) (relu): ReLU(inplace=True) ) (3): Bottleneck( (conv1): Conv2d(512, 128, kernel_size=(1, 1), stride=(1, 1), bias=False) (bn1): DistSyncBatchNorm(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True) (conv2): SplAtConv2d( (conv): Conv2d(128, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), groups=2, bias=False) (bn0): DistSyncBatchNorm(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True) (relu): ReLU(inplace=True) (fc1): Conv2d(128, 64, kernel_size=(1, 1), stride=(1, 1)) (bn1): DistSyncBatchNorm(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True) (fc2): Conv2d(64, 256, kernel_size=(1, 1), stride=(1, 1)) (rsoftmax): rSoftMax() ) (conv3): Conv2d(128, 512, kernel_size=(1, 1), stride=(1, 1), bias=False) (bn3): DistSyncBatchNorm(512, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True) (relu): ReLU(inplace=True) ) ) (layer3): Sequential( (0): Bottleneck( (conv1): Conv2d(512, 256, kernel_size=(1, 1), stride=(1, 1), bias=False) (bn1): DistSyncBatchNorm(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True) (avd_layer): AvgPool2d(kernel_size=3, stride=1, padding=1) (conv2): SplAtConv2d( (conv): Conv2d(256, 512, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), groups=2, bias=False) (bn0): DistSyncBatchNorm(512, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True) (relu): ReLU(inplace=True) (fc1): Conv2d(256, 128, kernel_size=(1, 1), stride=(1, 1)) (bn1): DistSyncBatchNorm(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True) (fc2): Conv2d(128, 512, kernel_size=(1, 1), stride=(1, 1)) (rsoftmax): rSoftMax() ) (conv3): Conv2d(256, 1024, kernel_size=(1, 1), stride=(1, 1), bias=False) (bn3): DistSyncBatchNorm(1024, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True) (relu): ReLU(inplace=True) (downsample): Sequential( (0): AvgPool2d(kernel_size=1, stride=1, padding=0) (1): Conv2d(512, 1024, kernel_size=(1, 1), stride=(1, 1), bias=False) (2): DistSyncBatchNorm(1024, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True) ) ) (1): Bottleneck( (conv1): Conv2d(1024, 256, kernel_size=(1, 1), stride=(1, 1), bias=False) (bn1): DistSyncBatchNorm(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True) (conv2): SplAtConv2d( (conv): Conv2d(256, 512, kernel_size=(3, 3), stride=(1, 1), padding=(2, 2), dilation=(2, 2), groups=2, bias=False) (bn0): DistSyncBatchNorm(512, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True) (relu): ReLU(inplace=True) (fc1): Conv2d(256, 128, kernel_size=(1, 1), stride=(1, 1)) (bn1): DistSyncBatchNorm(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True) (fc2): Conv2d(128, 512, kernel_size=(1, 1), stride=(1, 1)) (rsoftmax): rSoftMax() ) (conv3): Conv2d(256, 1024, kernel_size=(1, 1), stride=(1, 1), bias=False) (bn3): DistSyncBatchNorm(1024, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True) (relu): ReLU(inplace=True) ) (2): Bottleneck( (conv1): Conv2d(1024, 256, kernel_size=(1, 1), stride=(1, 1), bias=False) (bn1): DistSyncBatchNorm(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True) (conv2): SplAtConv2d( (conv): Conv2d(256, 512, kernel_size=(3, 3), stride=(1, 1), padding=(2, 2), dilation=(2, 2), groups=2, bias=False) (bn0): DistSyncBatchNorm(512, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True) (relu): ReLU(inplace=True) (fc1): Conv2d(256, 128, kernel_size=(1, 1), stride=(1, 1)) (bn1): DistSyncBatchNorm(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True) (fc2): Conv2d(128, 512, kernel_size=(1, 1), stride=(1, 1)) (rsoftmax): rSoftMax() ) (conv3): Conv2d(256, 1024, kernel_size=(1, 1), stride=(1, 1), bias=False) (bn3): DistSyncBatchNorm(1024, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True) (relu): ReLU(inplace=True) ) (3): Bottleneck( (conv1): Conv2d(1024, 256, kernel_size=(1, 1), stride=(1, 1), bias=False) (bn1): DistSyncBatchNorm(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True) (conv2): SplAtConv2d( (conv): Conv2d(256, 512, kernel_size=(3, 3), stride=(1, 1), padding=(2, 2), dilation=(2, 2), groups=2, bias=False) (bn0): DistSyncBatchNorm(512, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True) (relu): ReLU(inplace=True) (fc1): Conv2d(256, 128, kernel_size=(1, 1), stride=(1, 1)) (bn1): DistSyncBatchNorm(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True) (fc2): Conv2d(128, 512, kernel_size=(1, 1), stride=(1, 1)) (rsoftmax): rSoftMax() ) (conv3): Conv2d(256, 1024, kernel_size=(1, 1), stride=(1, 1), bias=False) (bn3): DistSyncBatchNorm(1024, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True) (relu): ReLU(inplace=True) ) (4): Bottleneck( (conv1): Conv2d(1024, 256, kernel_size=(1, 1), stride=(1, 1), bias=False) (bn1): DistSyncBatchNorm(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True) (conv2): SplAtConv2d( (conv): Conv2d(256, 512, kernel_size=(3, 3), stride=(1, 1), padding=(2, 2), dilation=(2, 2), groups=2, bias=False) (bn0): DistSyncBatchNorm(512, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True) (relu): ReLU(inplace=True) (fc1): Conv2d(256, 128, kernel_size=(1, 1), stride=(1, 1)) (bn1): DistSyncBatchNorm(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True) (fc2): Conv2d(128, 512, kernel_size=(1, 1), stride=(1, 1)) (rsoftmax): rSoftMax() ) (conv3): Conv2d(256, 1024, kernel_size=(1, 1), stride=(1, 1), bias=False) (bn3): DistSyncBatchNorm(1024, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True) (relu): ReLU(inplace=True) ) (5): Bottleneck( (conv1): Conv2d(1024, 256, kernel_size=(1, 1), stride=(1, 1), bias=False) (bn1): DistSyncBatchNorm(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True) (conv2): SplAtConv2d( (conv): Conv2d(256, 512, kernel_size=(3, 3), stride=(1, 1), padding=(2, 2), dilation=(2, 2), groups=2, bias=False) (bn0): DistSyncBatchNorm(512, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True) (relu): ReLU(inplace=True) (fc1): Conv2d(256, 128, kernel_size=(1, 1), stride=(1, 1)) (bn1): DistSyncBatchNorm(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True) (fc2): Conv2d(128, 512, kernel_size=(1, 1), stride=(1, 1)) (rsoftmax): rSoftMax() ) (conv3): Conv2d(256, 1024, kernel_size=(1, 1), stride=(1, 1), bias=False) (bn3): DistSyncBatchNorm(1024, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True) (relu): ReLU(inplace=True) ) ) (layer4): Sequential( (0): Bottleneck( (conv1): Conv2d(1024, 512, kernel_size=(1, 1), stride=(1, 1), bias=False) (bn1): DistSyncBatchNorm(512, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True) (avd_layer): AvgPool2d(kernel_size=3, stride=1, padding=1) (conv2): SplAtConv2d( (conv): Conv2d(512, 1024, kernel_size=(3, 3), stride=(1, 1), padding=(2, 2), dilation=(2, 2), groups=2, bias=False) (bn0): DistSyncBatchNorm(1024, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True) (relu): ReLU(inplace=True) (fc1): Conv2d(512, 256, kernel_size=(1, 1), stride=(1, 1)) (bn1): DistSyncBatchNorm(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True) (fc2): Conv2d(256, 1024, kernel_size=(1, 1), stride=(1, 1)) (rsoftmax): rSoftMax() ) (conv3): Conv2d(512, 2048, kernel_size=(1, 1), stride=(1, 1), bias=False) (bn3): DistSyncBatchNorm(2048, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True) (relu): ReLU(inplace=True) (downsample): Sequential( (0): AvgPool2d(kernel_size=1, stride=1, padding=0) (1): Conv2d(1024, 2048, kernel_size=(1, 1), stride=(1, 1), bias=False) (2): DistSyncBatchNorm(2048, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True) ) ) (1): Bottleneck( (conv1): Conv2d(2048, 512, kernel_size=(1, 1), stride=(1, 1), bias=False) (bn1): DistSyncBatchNorm(512, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True) (conv2): SplAtConv2d( (conv): Conv2d(512, 1024, kernel_size=(3, 3), stride=(1, 1), padding=(4, 4), dilation=(4, 4), groups=2, bias=False) (bn0): DistSyncBatchNorm(1024, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True) (relu): ReLU(inplace=True) (fc1): Conv2d(512, 256, kernel_size=(1, 1), stride=(1, 1)) (bn1): DistSyncBatchNorm(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True) (fc2): Conv2d(256, 1024, kernel_size=(1, 1), stride=(1, 1)) (rsoftmax): rSoftMax() ) (conv3): Conv2d(512, 2048, kernel_size=(1, 1), stride=(1, 1), bias=False) (bn3): DistSyncBatchNorm(2048, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True) (relu): ReLU(inplace=True) ) (2): Bottleneck( (conv1): Conv2d(2048, 512, kernel_size=(1, 1), stride=(1, 1), bias=False) (bn1): DistSyncBatchNorm(512, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True) (conv2): SplAtConv2d( (conv): Conv2d(512, 1024, kernel_size=(3, 3), stride=(1, 1), padding=(4, 4), dilation=(4, 4), groups=2, bias=False) (bn0): DistSyncBatchNorm(1024, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True) (relu): ReLU(inplace=True) (fc1): Conv2d(512, 256, kernel_size=(1, 1), stride=(1, 1)) (bn1): DistSyncBatchNorm(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True) (fc2): Conv2d(256, 1024, kernel_size=(1, 1), stride=(1, 1)) (rsoftmax): rSoftMax() ) (conv3): Conv2d(512, 2048, kernel_size=(1, 1), stride=(1, 1), bias=False) (bn3): DistSyncBatchNorm(2048, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True) (relu): ReLU(inplace=True) ) ) (avgpool): GlobalAvgPool2d() (fc): None ) (head): DeepLabV3Head( (aspp): ASPP_Module( (b0): Sequential( (0): Conv2d(2048, 256, kernel_size=(1, 1), stride=(1, 1), bias=False) (1): DistSyncBatchNorm(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True) (2): ReLU(inplace=True) ) (b1): Sequential( (0): Conv2d(2048, 256, kernel_size=(3, 3), stride=(1, 1), padding=(12, 12), dilation=(12, 12), bias=False) (1): DistSyncBatchNorm(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True) (2): ReLU(inplace=True) ) (b2): Sequential( (0): Conv2d(2048, 256, kernel_size=(3, 3), stride=(1, 1), padding=(24, 24), dilation=(24, 24), bias=False) (1): DistSyncBatchNorm(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True) (2): ReLU(inplace=True) ) (b3): Sequential( (0): Conv2d(2048, 256, kernel_size=(3, 3), stride=(1, 1), padding=(36, 36), dilation=(36, 36), bias=False) (1): DistSyncBatchNorm(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True) (2): ReLU(inplace=True) ) (b4): AsppPooling( (gap): Sequential( (0): AdaptiveAvgPool2d(output_size=1) (1): Conv2d(2048, 256, kernel_size=(1, 1), stride=(1, 1), bias=False) (2): DistSyncBatchNorm(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True) (3): ReLU(inplace=True) ) ) (project): Sequential( (0): Conv2d(1280, 256, kernel_size=(1, 1), stride=(1, 1), bias=False) (1): DistSyncBatchNorm(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True) (2): ReLU(inplace=True) (3): Dropout2d(p=0.5, inplace=False) ) ) (block): Sequential( (0): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False) (1): DistSyncBatchNorm(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True) (2): ReLU(inplace=True) (3): Dropout(p=0.1, inplace=False) (4): Conv2d(256, 21, kernel_size=(1, 1), stride=(1, 1)) ) ) (auxlayer): FCNHead( (conv5): Sequential( (0): Conv2d(1024, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False) (1): DistSyncBatchNorm(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True) (2): ReLU() (3): Dropout(p=0.1, inplace=False) (4): Conv2d(256, 21, kernel_size=(1, 1), stride=(1, 1)) ) ) ) Using poly LR scheduler with warm-up epochs of 0! Starting Epoch: 0 Total Epoches: 30 3 rank: 2 / 3 BaseDataset: base_size 520, crop_size 480 train set loading annotations into memory... Done (t=26.99s) creating index... index created! val set loading annotations into memory... Done (t=2.61s) creating index... index created! Using poly LR scheduler with warm-up epochs of 0! 3 rank: 1 / 3 BaseDataset: base_size 520, crop_size 480 train set loading annotations into memory... Done (t=27.07s) creating index... index created! val set loading annotations into memory... Done (t=2.73s) creating index... index created! Using poly LR scheduler with warm-up epochs of 0! Traceback (most recent call last): File "train_dist.py", line 320, in <module> main() File "train_dist.py", line 148, in main mp.spawn(main_worker, nprocs=ngpus_per_node, args=(ngpus_per_node, args)) File "/home/user/miniconda3/envs/pytorch_encoding/lib/python3.7/site-packages/torch/multiprocessing/spawn.py", line 171, in spawn while not spawn_context.join(): File "/home/user/miniconda3/envs/pytorch_encoding/lib/python3.7/site-packages/torch/multiprocessing/spawn.py", line 118, in join raise Exception(msg) Exception: -- Process 2 terminated with the following error: Traceback (most recent call last): File "/home/user/miniconda3/envs/pytorch_encoding/lib/python3.7/site-packages/torch/multiprocessing/spawn.py", line 19, in _wrap fn(i, *args) File "/data2/shared/research/img_segmentation/fresh_pytorch_encoding/experiments/segmentation/train_dist.py", line 309, in main_worker training(epoch) File "/data2/shared/research/img_segmentation/fresh_pytorch_encoding/experiments/segmentation/train_dist.py", line 242, in training for i, (image, target) in enumerate(trainloader): File "/home/user/miniconda3/envs/pytorch_encoding/lib/python3.7/site-packages/torch/utils/data/dataloader.py", line 279, in __iter__ return _MultiProcessingDataLoaderIter(self) File "/home/user/miniconda3/envs/pytorch_encoding/lib/python3.7/site-packages/torch/utils/data/dataloader.py", line 719, in __init__ w.start() File "/home/user/miniconda3/envs/pytorch_encoding/lib/python3.7/multiprocessing/process.py", line 112, in start self._popen = self._Popen(self) File "/home/user/miniconda3/envs/pytorch_encoding/lib/python3.7/multiprocessing/context.py", line 223, in _Popen return _default_context.get_context().Process._Popen(process_obj) File "/home/user/miniconda3/envs/pytorch_encoding/lib/python3.7/multiprocessing/context.py", line 284, in _Popen return Popen(process_obj) File "/home/user/miniconda3/envs/pytorch_encoding/lib/python3.7/multiprocessing/popen_spawn_posix.py", line 32, in __init__ super().__init__(process_obj) File "/home/user/miniconda3/envs/pytorch_encoding/lib/python3.7/multiprocessing/popen_fork.py", line 20, in __init__ self._launch(process_obj) File "/home/user/miniconda3/envs/pytorch_encoding/lib/python3.7/multiprocessing/popen_spawn_posix.py", line 47, in _launch reduction.dump(process_obj, fp) File "/home/user/miniconda3/envs/pytorch_encoding/lib/python3.7/multiprocessing/reduction.py", line 60, in dump ForkingPickler(file, protocol).dump(obj) TypeError: can't pickle module objects
with specification below
I run
and run
having logs like below