juntang-zhuang / LadderNet

139 stars 36 forks source link

RuntimeErrorreduction when running training script #8

Closed YrgkenKoutsi closed 4 years ago

YrgkenKoutsi commented 4 years ago

Hello,

I'm currently conducting comparison research on Convolutional Neural Networks. Due to GPU issues, I thought I try running the script on CPU while I wait for my GPU to be resolved which this error might be related in. Although I'm afraid that might not be the issues and it is probably a wrong installation process I might have followed.

The following error is thrown once I run retinaNN_training.py, I tried reducing epochs and batches as I thought that might be the issue but the error kept insisting:

`raceback (most recent call last): File "", line 1, in File "D:\Program_Files\Miniconda3\lib\multiprocessing\spawn.py", line 105, in spawn_main exitcode = _main(fd) File "D:\Program_Files\Miniconda3\lib\multiprocessing\spawn.py", line 114, in _main prepare(preparation_data) File "D:\Program_Files\Miniconda3\lib\multiprocessing\spawn.py", line 225, in prepare _fixup_main_from_path(data['init_main_from_path']) File "D:\Program_Files\Miniconda3\lib\multiprocessing\spawn.py", line 277, in _fixup_main_from_path run_name="__mp_main") File "D:\Program_Files\Miniconda3\lib\runpy.py", line 263, in run_path pkg_name=pkg_name, script_name=fname) File "D:\Program_Files\Miniconda3\lib\runpy.py", line 96, in _run_module_code mod_name, mod_spec, pkg_name, script_name) File "D:\Program_Files\Miniconda3\lib\runpy.py", line 85, in _run_code exec(code, run_globals) File "D:\Assignments\New_2019-2020\Desktop\Assignments\CT6039_Dissertation_30_Credits\Source_Code\Main_Source\Lab_Experiments\Experiment_4\LadderNet\LadderNet-master\src\retinaNN_training.py", line 206, in train(epoch) File "D:\Assignments\New_2019-2020\Desktop\Assignments\CT6039_Dissertation_30_Credits\Source_Code\Main_Source\Lab_Experiments\Experiment_4\LadderNet\LadderNet-master\src\retinaNN_training.py", line 164, in train for batch_idx, (inputs, targets) in enumerate(tqdm(train_loader)): File "D:\Program_Files\Miniconda3\lib\site-packages\tqdm\std.py", line 1081, in iter Traceback (most recent call last): File "retinaNN_training.py", line 206, in for obj in iterable: File "D:\Program_Files\Miniconda3\lib\site-packages\torch\utils\data\dataloader.py", line 279, in iter__ train(epoch) File "retinaNN_training.py", line 164, in train return _MultiProcessingDataLoaderIter(self)for batch_idx, (inputs, targets) in enumerate(tqdm(train_loader)):

File "D:\Program_Files\Miniconda3\lib\site-packages\torch\utils\data\dataloader.py", line 719, in init File "D:\Program_Files\Miniconda3\lib\site-packages\tqdm\std.py", line 1081, in iter w.start() File "D:\Program_Files\Miniconda3\lib\multiprocessing\process.py", line 112, in start for obj in iterable: File "D:\Program_Files\Miniconda3\lib\site-packages\torch\utils\data\dataloader.py", line 279, in iter self._popen = self._Popen(self) return _MultiProcessingDataLoaderIter(self) File "D:\Program_Files\Miniconda3\lib\multiprocessing\context.py", line 223, in _Popen

File "D:\Program_Files\Miniconda3\lib\site-packages\torch\utils\data\dataloader.py", line 719, in init return _default_context.get_context().Process._Popen(process_obj) File "D:\Program_Files\Miniconda3\lib\multiprocessing\context.py", line 322, in _Popen w.start()return Popen(process_obj)

File "D:\Program_Files\Miniconda3\lib\multiprocessing\process.py", line 112, in start File "D:\Program_Files\Miniconda3\lib\multiprocessing\popen_spawn_win32.py", line 46, in init self._popen = self._Popen(self) File "D:\Program_Files\Miniconda3\lib\multiprocessing\context.py", line 223, in _Popen prep_data = spawn.get_preparation_data(process_obj._name) return _default_context.get_context().Process._Popen(process_obj) File "D:\Program_Files\Miniconda3\lib\multiprocessing\spawn.py", line 143, in get_preparation_data

  File "D:\Program_Files\Miniconda3\lib\multiprocessing\context.py", line 322, in _Popen

_check_not_importing_main() return Popen(process_obj) File "D:\Program_Files\Miniconda3\lib\multiprocessing\spawn.py", line 136, in _check_not_importing_main

is not going to be frozen to produce an executable.''')  File "D:\Program_Files\Miniconda3\lib\multiprocessing\popen_spawn_win32.py", line 89, in __init__

RuntimeErrorreduction.dump(process_obj, to_child):

    An attempt has been made to start a new process before the
    current process has finished its bootstrapping phase.

    This probably means that you are not using fork to start your
    child processes and you have forgotten to use the proper idiom
    in the main module:

        if __name__ == '__main__':
            freeze_support()
            ...

    The "freeze_support()" line can be omitted if the program
    is not going to be frozen to produce an executable.  File "D:\Program_Files\Miniconda3\lib\multiprocessing\reduction.py", line 60, in dump

ForkingPickler(file, protocol).dump(obj)

BrokenPipeError: [Errno 32] Broken pipe 0%| | 0/9 [00:00<?, ?it/s] 0%| | 0/9 [00:02<?, ?it/s] `

Any advice around this would be really appreciated.

manvirvirk commented 4 years ago

i m getting this error too.Any solution to this??

YrgkenKoutsi commented 4 years ago

Hey there, depends if you run on CPU or GPU, I would suggest you to run on GPU as CPU wont cut it for this project, either way there are a few things you need to do;

1)losses.py file change line 7 to: return x.cuda(True) if torch.cuda.is_available() else x

2)retinaNN_predict.py file change line 163 the num_workers numbers to 0 and play with batch size in line 159, if you get ram error it means that you are using more batch size then your computer can handle so reduce it until you get no ram size errors.

3)retinaNN_training.py file change lines 116 and 120 num_worker to 0

**IMPORTANT 4)The configuration.txt file play with N_subimgs = 190000 batch_size = 1024 (this is important, you probably will have ram errors so you need to reduce this, at least i do and i run on NVIDIA GeForce GTX 1050 Ti ) N_epochs = 150 (optional) if you want reduce it or increase it

**ONLY change the num workers if you run on windows, as far as i know multiprocessing on CUDA tensors is not supported on windows unless there is a way around this which i do not know. For more about this follow the link.

https://pytorch.org/docs/stable/notes/windows.html#multiprocessing-error-without-if-clause-protection

Linux os should be okay, though i have not tested this.

Hope that helps :)

manvirvirk commented 4 years ago

it works, thanks (y)

On Fri, Mar 13, 2020 at 12:38 AM Yrgen Kuci notifications@github.com wrote:

Hey there, depends if you run on CPU or GPU, I would suggest you to run on GPU as CPU wont cut it for this project, either way there are a few things you need to do;

1)losses.py file change line 7 to: return x.cuda(True) if torch.cuda.is_available() else x

2)retinaNN_predict.py file change line 163 the num_workers numbers to 0 and play with batch size in line 159, if you get ram error it means that you are using more batch size then your computer can handle so reduce it until you get no ram size errors.

3)retinaNN_training.py file change lines 116 and 120 num_worker to 0

**IMPORTANT 4)The configuration.txt file play with N_subimgs = 190000 batch_size = 1024 (this is important, you probably will have ram errors so you need to reduce this, at least i do and i run on NVIDIA GeForce GTX 1050 Ti ) N_epochs = 150 (optional) if you want reduce it or increase it

**ONLY change the num workers if you run on windows, as far as i know multi worker training is not supported on windows unless there is a way around this which i do not know.

Linux os should be okay, though i have not tested this.

Hope that helps :)

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/juntang-zhuang/LadderNet/issues/8#issuecomment-598367142, or unsubscribe https://github.com/notifications/unsubscribe-auth/ANAEAQEDVCR2OTGKSW6AWJ3RHEXMNANCNFSM4KWX42SA .