Open dbrivio opened 5 years ago
add the following lines to the end of the imports section, right after: import torchvision.utils as vutils
if __name__ == '__main__':
torch.multiprocessing.set_start_method('spawn')
ideally the script needs to be refactored to push everything into a main() function (that's the original problem)
I have the same error. With a different file but about the same.
I am seeing this when running it on Windows 10, it is solved when I set num_workers=0 for the DataLoader()
@soumith Hello, I have the same issue. And I have tried to set the multiprocessing start method to spawn
, but it has no difference and the error still exists.
Could you please tell me another way to solve it?
I am seeing this when running it on Windows 10, it is solved when I set num_workers=0 for the DataLoader()
Perfect solution, but what are the specific reasons~
@soumith Can you elaborate on the issue here? The common factor in my code with this code for me is LMDB, and it produces the exact same error. Does this have something to do with trouble pickling the lmdb instance?
The issue is that you cannot pickle LMDB env objects. Setting num_workers=0 prevents the need to pickle anything since the main process original object handles retrieving data.
The real solution is to store the Environment variable in a class with a custom getitem() and setitem() functions that delete the LMDB Environment variable from the returned dictionary and then regenerate it when loaded.
I am seeing this when running it on Windows 10, it is solved when I set num_workers=0 for the DataLoader()
you saved me, man!! thanks.
I find some Github repos which use both LMDB and num_workers, and finally, successfully work. But I don't know why? You guys can find the examples here. stylegan2 dataset
@jgoodson @neillbyrne @ruotianluo @airsplay the solution
I am seeing this when running it on Windows 10, it is solved when I set num_workers=0 for the DataLoader()
Perfect solution, but what are the specific reasons~
As in PyTorch documentation, torch.utils.data
: Platform-specific behaviors — PyTorch 2.3 documentation, the implementation of Python multiprocessing relies on different functions on different platforms: fork()
on Unix, while spawn()
on Windows and MacOS. I guess by default, PyTorch will call fork()
when num_workers
is not zero (that is using multiprocessing), causing this error on Windows systems. And one way to solve it is to wrap the code including dataloader iteration under if __name__ == '__main__':
.
Hello,
I'm trying to run the dcgan/main.py file to train a GAN. I'm using a Windows 7 system with python 3.7 (anaconda)
I run the following line %run main.py --dataset lsun --dataroot bedroom_train_lmdb/ --niter 1
and I got the following
Namespace(batchSize=64, beta1=0.5, cuda=False, dataroot='bedroom_train_lmdb/', dataset='lsun', imageSize=64, lr=0.0002, manualSeed=None, ndf=64, netD='', netG='', ngf=64, ngpu=1, niter=1, nz=100, outf='.', workers=2) Random Seed: 482 Generator( (main): Sequential( (0): ConvTranspose2d(100, 512, kernel_size=(4, 4), stride=(1, 1), bias=False) (1): BatchNorm2d(512, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True) (2): ReLU(inplace) (3): ConvTranspose2d(512, 256, kernel_size=(4, 4), stride=(2, 2), padding=(1, 1), bias=False) (4): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True) (5): ReLU(inplace) (6): ConvTranspose2d(256, 128, kernel_size=(4, 4), stride=(2, 2), padding=(1, 1), bias=False) (7): BatchNorm2d(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True) (8): ReLU(inplace) (9): ConvTranspose2d(128, 64, kernel_size=(4, 4), stride=(2, 2), padding=(1, 1), bias=False) (10): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True) (11): ReLU(inplace) (12): ConvTranspose2d(64, 3, kernel_size=(4, 4), stride=(2, 2), padding=(1, 1), bias=False) (13): Tanh() ) ) Discriminator( (main): Sequential( (0): Conv2d(3, 64, kernel_size=(4, 4), stride=(2, 2), padding=(1, 1), bias=False) (1): LeakyReLU(negative_slope=0.2, inplace) (2): Conv2d(64, 128, kernel_size=(4, 4), stride=(2, 2), padding=(1, 1), bias=False) (3): BatchNorm2d(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True) (4): LeakyReLU(negative_slope=0.2, inplace) (5): Conv2d(128, 256, kernel_size=(4, 4), stride=(2, 2), padding=(1, 1), bias=False) (6): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True) (7): LeakyReLU(negative_slope=0.2, inplace) (8): Conv2d(256, 512, kernel_size=(4, 4), stride=(2, 2), padding=(1, 1), bias=False) (9): BatchNorm2d(512, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True) (10): LeakyReLU(negative_slope=0.2, inplace) (11): Conv2d(512, 1, kernel_size=(4, 4), stride=(1, 1), bias=False) (12): Sigmoid() ) ) Traceback (most recent call last):
File "Y:\Research\Davide\ML\GAN\lsun-master\main.py", line 210, in
for i, data in enumerate(dataloader, 0):
File "C:\Users\db396\AppData\Local\Continuum\anaconda3\lib\site-packages\torch\utils\data\dataloader.py", line 819, in iter return _DataLoaderIter(self)
File "C:\Users\db396\AppData\Local\Continuum\anaconda3\lib\site-packages\torch\utils\data\dataloader.py", line 560, in init w.start()
File "C:\Users\db396\AppData\Local\Continuum\anaconda3\lib\multiprocessing\process.py", line 112, in start self._popen = self._Popen(self)
File "C:\Users\db396\AppData\Local\Continuum\anaconda3\lib\multiprocessing\context.py", line 223, in _Popen return _default_context.get_context().Process._Popen(process_obj)
File "C:\Users\db396\AppData\Local\Continuum\anaconda3\lib\multiprocessing\context.py", line 322, in _Popen return Popen(process_obj)
File "C:\Users\db396\AppData\Local\Continuum\anaconda3\lib\multiprocessing\popen_spawn_win32.py", line 65, in init reduction.dump(process_obj, to_child)
File "C:\Users\db396\AppData\Local\Continuum\anaconda3\lib\multiprocessing\reduction.py", line 60, in dump ForkingPickler(file, protocol).dump(obj)
TypeError: can't pickle Environment objects
It must be something related to windows. Any suggestions about how to solve this issue? Thanks