scaelles / DEXTR-PyTorch

Deep Extreme Cut http://www.vision.ee.ethz.ch/~cvlsegmentation/dextr
GNU General Public License v3.0
843 stars 153 forks source link

BrokenPipeError: [Errno 32] Broken pipe #6

Closed Jakaria08 closed 6 years ago

Jakaria08 commented 6 years ago

Hi,

I am using windows 10 and when i tried to ran train_pascal, the following error occured. Could you please tell me if there any solution for this?

Output:

runfile('C:/Users/jakaria/DEXTR-PyTorch/train_pascal.py', wdir='C:/Users/jakaria/DEXTR-PyTorch') Reloaded modules: dataloaders.pascal, dataloaders.helpers, dataloaders.combine_dbs, networks.deeplab_resnet, layers, dataloaders.custom_transforms, layers.loss, dataloaders.sbd, dataloaders, networks, mypath Using GPU: 0 Constructing ResNet model... Dilations: (2, 4) Number of classes: 1 Number of Input Channels: 4 Initializing classifier: PSP Skipping Conv layer with size: torch.Size([512, 2048, 1, 1]) and target size: torch.Size([1, 2048, 3, 3]) Initializing from pretrained Deeplab-v2 model Preprocessing of PASCAL VOC dataset, this will take long, but it will be done only once. Preprocessing finished Number of images: 1464 Number of objects: 3507 Preprocessing of PASCAL VOC dataset, this will take long, but it will be done only once. Preprocessing finished Number of images: 1449 Number of objects: 3427 Training Network Traceback (most recent call last):

File "", line 1, in runfile('C:/Users/jakaria/DEXTR-PyTorch/train_pascal.py', wdir='C:/Users/jakaria/DEXTR-PyTorch')

File "C:\Users\jakaria\AppData\Local\conda\conda\envs\tensorflow\lib\site-packages\spyder\utils\site\sitecustomize.py", line 705, in runfile execfile(filename, namespace)

File "C:\Users\jakaria\AppData\Local\conda\conda\envs\tensorflow\lib\site-packages\spyder\utils\site\sitecustomize.py", line 102, in execfile exec(compile(f.read(), filename, 'exec'), namespace)

File "C:/Users/jakaria/DEXTR-PyTorch/train_pascal.py", line 139, in for ii, sample_batched in enumerate(trainloader):

File "C:\Users\jakaria\AppData\Local\conda\conda\envs\tensorflow\lib\site-packages\torch\utils\data\dataloader.py", line 417, in iter return DataLoaderIter(self)

File "C:\Users\jakaria\AppData\Local\conda\conda\envs\tensorflow\lib\site-packages\torch\utils\data\dataloader.py", line 234, in init w.start()

File "C:\Users\jakaria\AppData\Local\conda\conda\envs\tensorflow\lib\multiprocessing\process.py", line 105, in start self._popen = self._Popen(self)

File "C:\Users\jakaria\AppData\Local\conda\conda\envs\tensorflow\lib\multiprocessing\context.py", line 212, in _Popen return _default_context.get_context().Process._Popen(process_obj)

File "C:\Users\jakaria\AppData\Local\conda\conda\envs\tensorflow\lib\multiprocessing\context.py", line 313, in _Popen return Popen(process_obj)

File "C:\Users\jakaria\AppData\Local\conda\conda\envs\tensorflow\lib\multiprocessing\popen_spawn_win32.py", line 66, in init reduction.dump(process_obj, to_child)

File "C:\Users\jakaria\AppData\Local\conda\conda\envs\tensorflow\lib\multiprocessing\reduction.py", line 59, in dump ForkingPickler(file, protocol).dump(obj)

BrokenPipeError: [Errno 32] Broken pipe

runfile('C:/Users/jakaria/DEXTR-PyTorch/train_pascal.py', wdir='C:/Users/jakaria/DEXTR-PyTorch') Reloaded modules: dataloaders.pascal, dataloaders.helpers, dataloaders.combine_dbs, networks.deeplab_resnet, layers, dataloaders.custom_transforms, layers.loss, dataloaders.sbd, dataloaders, networks, mypath Using GPU: 0 Constructing ResNet model... Dilations: (2, 4) Number of classes: 1 Number of Input Channels: 4 Initializing classifier: PSP Skipping Conv layer with size: torch.Size([512, 2048, 1, 1]) and target size: torch.Size([1, 2048, 3, 3]) Initializing from pretrained Deeplab-v2 model Number of images: 1464 Number of objects: 3507 Number of images: 1449 Number of objects: 3427 Training Network Traceback (most recent call last):

File "", line 1, in runfile('C:/Users/jakaria/DEXTR-PyTorch/train_pascal.py', wdir='C:/Users/jakaria/DEXTR-PyTorch')

File "C:\Users\jakaria\AppData\Local\conda\conda\envs\tensorflow\lib\site-packages\spyder\utils\site\sitecustomize.py", line 705, in runfile execfile(filename, namespace)

File "C:\Users\jakaria\AppData\Local\conda\conda\envs\tensorflow\lib\site-packages\spyder\utils\site\sitecustomize.py", line 102, in execfile exec(compile(f.read(), filename, 'exec'), namespace)

File "C:/Users/jakaria/DEXTR-PyTorch/train_pascal.py", line 139, in for ii, sample_batched in enumerate(trainloader):

File "C:\Users\jakaria\AppData\Local\conda\conda\envs\tensorflow\lib\site-packages\torch\utils\data\dataloader.py", line 417, in iter return DataLoaderIter(self)

File "C:\Users\jakaria\AppData\Local\conda\conda\envs\tensorflow\lib\site-packages\torch\utils\data\dataloader.py", line 234, in init w.start()

File "C:\Users\jakaria\AppData\Local\conda\conda\envs\tensorflow\lib\multiprocessing\process.py", line 105, in start self._popen = self._Popen(self)

File "C:\Users\jakaria\AppData\Local\conda\conda\envs\tensorflow\lib\multiprocessing\context.py", line 212, in _Popen return _default_context.get_context().Process._Popen(process_obj)

File "C:\Users\jakaria\AppData\Local\conda\conda\envs\tensorflow\lib\multiprocessing\context.py", line 313, in _Popen return Popen(process_obj)

File "C:\Users\jakaria\AppData\Local\conda\conda\envs\tensorflow\lib\multiprocessing\popen_spawn_win32.py", line 66, in init reduction.dump(process_obj, to_child)

File "C:\Users\jakaria\AppData\Local\conda\conda\envs\tensorflow\lib\multiprocessing\reduction.py", line 59, in dump ForkingPickler(file, protocol).dump(obj)

BrokenPipeError: [Errno 32] Broken pipe

Jakaria08 commented 6 years ago

Hi,

After setting worker = 0, I got new error:

"ValueError: Expected more than 1 value per channel when training, got input size [1, 512, 1, 1]"

Traceback (most recent call last):

File "", line 1, in runfile('C:/Users/jakaria/DEXTR-PyTorch/train_pascal.py', wdir='C:/Users/jakaria/DEXTR-PyTorch')

File "C:\Users\jakaria\AppData\Local\conda\conda\envs\tensorflow\lib\site-packages\spyder\utils\site\sitecustomize.py", line 705, in runfile execfile(filename, namespace)

File "C:\Users\jakaria\AppData\Local\conda\conda\envs\tensorflow\lib\site-packages\spyder\utils\site\sitecustomize.py", line 102, in execfile exec(compile(f.read(), filename, 'exec'), namespace)

File "C:/Users/jakaria/DEXTR-PyTorch/train_pascal.py", line 148, in output = net.forward(inputs)

File "C:\Users\jakaria\DEXTR-PyTorch\networks\deeplab_resnet.py", line 196, in forward x = self.layer5(x)

File "C:\Users\jakaria\AppData\Local\conda\conda\envs\tensorflow\lib\site-packages\torch\nn\modules\module.py", line 357, in call result = self.forward(*input, **kwargs)

File "C:\Users\jakaria\DEXTR-PyTorch\networks\deeplab_resnet.py", line 118, in forward priors = [F.upsample(input=stage(feats), size=(h, w), mode='bilinear') for stage in self.stages]

File "C:\Users\jakaria\DEXTR-PyTorch\networks\deeplab_resnet.py", line 118, in priors = [F.upsample(input=stage(feats), size=(h, w), mode='bilinear') for stage in self.stages]

File "C:\Users\jakaria\AppData\Local\conda\conda\envs\tensorflow\lib\site-packages\torch\nn\modules\module.py", line 357, in call result = self.forward(*input, **kwargs)

File "C:\Users\jakaria\AppData\Local\conda\conda\envs\tensorflow\lib\site-packages\torch\nn\modules\container.py", line 67, in forward input = module(input)

File "C:\Users\jakaria\AppData\Local\conda\conda\envs\tensorflow\lib\site-packages\torch\nn\modules\module.py", line 357, in call result = self.forward(*input, **kwargs)

File "C:\Users\jakaria\AppData\Local\conda\conda\envs\tensorflow\lib\site-packages\torch\nn\modules\batchnorm.py", line 37, in forward self.training, self.momentum, self.eps)

File "C:\Users\jakaria\AppData\Local\conda\conda\envs\tensorflow\lib\site-packages\torch\nn\functional.py", line 1011, in batch_norm raise ValueError('Expected more than 1 value per channel when training, got input size {}'.format(size))

ValueError: Expected more than 1 value per channel when training, got input size [1, 512, 1, 1]

Jakaria08 commented 6 years ago

I have 4 gb nvidia gtx 1050 GPU. Is it ok to train with that? can I reduce batch size to 1?

scaelles commented 6 years ago

Hello, you cannot use batch size 1 when training (refer to this issue). Therefore, batch size has to be at least 2. Morevoer, if len(train_set) % batch_size =1 you will encounter the same error. Just write in line 142 of train_pascal.py:

if inputs.shape[0] == 1:
    continue

Let us know if that fixes your problem.