Closed Mersion closed 1 year ago
Did you have a fuller traceback? ISLP does not import multiprocessing or manage any processes for DataLoader (which is a torch construct).
I suspect this is something to do with torch.
Are you using the versions specified here: https://github.com/intro-stat-learning/ISLP_labs/tree/stable ?
From: Seiran @.> Sent: Saturday, October 28, 2023 1:08 PM To: intro-stat-learning/ISLP_labs @.> Cc: Subscribed @.***> Subject: [intro-stat-learning/ISLP_labs] RunTimeErrors (Issue #20)
When I try to repeat the lab from Chapter 10 about Neural Networks, Hitters dataset. I run into: RuntimeError: An attempt has been made to start a new process before the current process has finished its bootstrapping phase.
This probably means that you are not using fork to start your
child processes and you have forgotten to use the proper idiom
in the main module:
if __name__ == '__main__':
freeze_support()
...
RuntimeError: DataLoader worker (pid(s) 7156, 796, 15192, 304) exited unexpectedly.
So I assume it has something to do with multiprocessing, data_loader but I'm total newbie here.
— Reply to this email directly, view it on GitHubhttps://github.com/intro-stat-learning/ISLP_labs/issues/20, or unsubscribehttps://github.com/notifications/unsubscribe-auth/AACTM2YFWULSW3HT6HELRETYBVQ25AVCNFSM6AAAAAA6UJVGY6VHI2DSMVQWIX3LMV43ASLTON2WKOZRHE3DMNZSGU2TIMY. You are receiving this because you are subscribed to this thread.Message ID: @.***>
Actually I think I fixed it. I added this line after imports, because it said something about deterministic behaviour:
os.environ["CUBLAS_WORKSPACE_CONFIG"]=":16:8"
but it's not enough for it to work, cause the other code is needed as well. I just put this __name__="__main__"
just before loading this SimpleDataModule, but it has to include evertyhing till the end? I don't know but it works this way:
if __name__ == "__main__":
hit_dm = SimpleDataModule(hit_train, hit_test, batch_size=32, num_workers=min(4, max_num_workers),
validation=hit_test
)
hit_module = SimpleModule.regression(hit_model, metrics={'mae': MeanAbsoluteError()})
hit_logger = CSVLogger('logs', name='hitters')
hit_trainer = Trainer(deterministic=True,
max_epochs=50,
log_every_n_steps=5,
logger=hit_logger,
callbacks=[ErrorTracker()])
hit_trainer.fit(hit_module, datamodule=hit_dm)
hit_trainer.test(hit_module, datamodule=hit_dm)
hit_results = pd.read_csv(hit_logger.experiment.metrics_file_path)
fig, ax = subplots(1, 1, figsize=(6, 6))
ax = summary_plot(hit_results, ax, col='mae', ylabel='MAE', valid_legend='Validation (=Test)')
ax.set_ylim([0, 400])
ax.set_xticks(np.linspace(0, 50, 11).astype(int))
hit_model.eval()
preds = hit_module(X_test_t)
torch.abs(Y_test_t - preds).mean()
plt.show()
del (Hitters, hit_model, hit_dm, hit_logger, hit_test, hit_train, X, Y, X_test, X_train, Y_test, Y_train, X_test_t,
Y_test_t, hit_trainer, hit_module)
But thanks for fast reply. And I Thank you as well for making this book!
Hmm... I'm not really sure of the cause here. Do you have any special hardware (related to CUBLAS)?
Is your "name == 'main'" in a notebook or a regular python script?
From: Seiran @.> Sent: Saturday, October 28, 2023 2:24 PM To: intro-stat-learning/ISLP_labs @.> Cc: Jonathan Taylor @.>; Comment @.> Subject: Re: [intro-stat-learning/ISLP_labs] RunTimeErrors (Issue #20)
Actually I think I fixed it. I added this line after imports, because it said something about deterministic behaviour: os.environ["CUBLAS_WORKSPACE_CONFIG"]=":16:8" but it's not enough for it to work, cause the other code is needed as well. I just put this name="main" just before loading this SimpleDataModule, but it has to include evertyhing till the end? I don't know but it works this way:
if name == "main": hit_dm = SimpleDataModule(hit_train, hit_test, batch_size=32, num_workers=min(4, max_num_workers), validation=hit_test ) hit_module = SimpleModule.regression(hit_model, metrics={'mae': MeanAbsoluteError()}) hit_logger = CSVLogger('logs', name='hitters')
hit_trainer = Trainer(deterministic=True,
max_epochs=50,
log_every_n_steps=5,
logger=hit_logger,
callbacks=[ErrorTracker()])
hit_trainer.fit(hit_module, datamodule=hit_dm)
hit_trainer.test(hit_module, datamodule=hit_dm)
hit_results = pd.read_csv(hit_logger.experiment.metrics_file_path)
fig, ax = subplots(1, 1, figsize=(6, 6))
ax = summary_plot(hit_results, ax, col='mae', ylabel='MAE', valid_legend='Validation (=Test)')
ax.set_ylim([0, 400])
ax.set_xticks(np.linspace(0, 50, 11).astype(int))
hit_model.eval()
preds = hit_module(X_test_t)
torch.abs(Y_test_t - preds).mean()
plt.show()
del (Hitters, hit_model, hit_dm, hit_logger, hit_test, hit_train, X, Y, X_test, X_train, Y_test, Y_train, X_test_t,
Y_test_t, hit_trainer, hit_module)
But thanks for fast reply. And I Thank you as well for making this book!
— Reply to this email directly, view it on GitHubhttps://github.com/intro-stat-learning/ISLP_labs/issues/20#issuecomment-1783924011, or unsubscribehttps://github.com/notifications/unsubscribe-auth/AACTM22QJKVVRFZRPZZEIETYBVZYPAVCNFSM6AAAAAA6UJVGY6VHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTOOBTHEZDIMBRGE. You are receiving this because you commented.Message ID: @.***>
Regular python script, but I run it in pycharm on a laptop with Nvidia Geforce rtx 3050ti. If I was to delete this environmental variable I would get:
RuntimeError: Deterministic behavior was enabled with either
torch.use_deterministic_algorithms(True)or
at::Context::setDeterministicAlgorithms(true), but this operation is not deterministic because it uses CuBLAS and you have CUDA >= 10.2. To enable deterministic behavior in this case, you must set an environment variable before running your PyTorch application: CUBLAS_WORKSPACE_CONFIG=:4096:8 or CUBLAS_WORKSPACE_CONFIG=:16:8. For more information, go to https://docs.nvidia.com/cuda/cublas/index.html#cublasApi_reproducibility
This is previous, almost full, cause it's repeats in loops, error trace, before adding "__name__ == '__main__'
.
`Sanity Checking: | | 0/? [00:00<?, ?it/s]Traceback (most recent call last):
File "<string>", line 1, in <module>
File "C:\Users\User\AppData\Local\Programs\Python\Python311\Lib\multiprocessing\spawn.py", line 120, in spawn_main
exitcode = _main(fd, parent_sentinel)
^^^^^^^^^^^^^^^^^^^^^^^^^^
File "C:\Users\User\AppData\Local\Programs\Python\Python311\Lib\multiprocessing\spawn.py", line 129, in _main
prepare(preparation_data)
File "C:\Users\User\AppData\Local\Programs\Python\Python311\Lib\multiprocessing\spawn.py", line 240, in prepare
_fixup_main_from_path(data['init_main_from_path'])
File "C:\Users\User\AppData\Local\Programs\Python\Python311\Lib\multiprocessing\spawn.py", line 291, in _fixup_main_from_path
main_content = runpy.run_path(main_path,
^^^^^^^^^^^^^^^^^^^^^^^^^
File "<frozen runpy>", line 291, in run_path
File "<frozen runpy>", line 98, in _run_module_code
File "<frozen runpy>", line 88, in _run_code
File "E:\Some_Directory\PyCharm\Projekt\ISLP\statlearning\NN\nn1.py", line 148, in <module>
hit_trainer.fit(hit_module, datamodule=hit_dm)
File "E:\Some_Directory\PyCharm\ISLP\Lib\site-packages\pytorch_lightning\trainer\trainer.py", line 545, in fit
call._call_and_handle_interrupt(
File "E:\Some_Directory\PyCharm\ISLP\Lib\site-packages\pytorch_lightning\trainer\call.py", line 44, in _call_and_handle_interrupt
return trainer_fn(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "E:\Some_Directory\PyCharm\ISLP\Lib\site-packages\pytorch_lightning\trainer\trainer.py", line 581, in _fit_impl
self._run(model, ckpt_path=ckpt_path)
File "E:\Some_Directory\PyCharm\ISLP\Lib\site-packages\pytorch_lightning\trainer\trainer.py", line 990, in _run
results = self._run_stage()
^^^^^^^^^^^^^^^^^
File "E:\Some_Directory\PyCharm\ISLP\Lib\site-packages\pytorch_lightning\trainer\trainer.py", line 1034, in _run_stage
self._run_sanity_check()
File "E:\Some_Directory\PyCharm\ISLP\Lib\site-packages\pytorch_lightning\trainer\trainer.py", line 1063, in _run_sanity_check
val_loop.run()
File "E:\Some_Directory\PyCharm\ISLP\Lib\site-packages\pytorch_lightning\loops\utilities.py", line 181, in _decorator
return loop_run(self, *args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "E:\Some_Directory\PyCharm\ISLP\Lib\site-packages\pytorch_lightning\loops\evaluation_loop.py", line 112, in run
self.reset()
File "E:\Some_Directory\PyCharm\ISLP\Lib\site-packages\pytorch_lightning\loops\evaluation_loop.py", line 230, in reset
iter(data_fetcher) # creates the iterator inside the fetcher
^^^^^^^^^^^^^^^^^^
File "E:\Some_Directory\PyCharm\ISLP\Lib\site-packages\pytorch_lightning\loops\fetchers.py", line 99, in __iter__
super().__iter__()
File "E:\Some_Directory\PyCharm\ISLP\Lib\site-packages\pytorch_lightning\loops\fetchers.py", line 48, in __iter__
self.iterator = iter(self.combined_loader)
^^^^^^^^^^^^^^^^^^^^^^^^^^
File "E:\Some_Directory\PyCharm\ISLP\Lib\site-packages\pytorch_lightning\utilities\combined_loader.py", line 335, in __iter__
iter(iterator)
File "E:\Some_Directory\PyCharm\ISLP\Lib\site-packages\pytorch_lightning\utilities\combined_loader.py", line 144, in __iter__
self._load_current_iterator()
File "E:\Some_Directory\PyCharm\ISLP\Lib\site-packages\pytorch_lightning\utilities\combined_loader.py", line 160, in _load_current_iterator
self.iterators = [iter(self.iterables[self._iterator_idx])]
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "E:\Some_Directory\PyCharm\ISLP\Lib\site-packages\torch\utils\data\dataloader.py", line 433, in __iter__
self._iterator = self._get_iterator()
^^^^^^^^^^^^^^^^^^^^
File "E:\Some_Directory\PyCharm\ISLP\Lib\site-packages\torch\utils\data\dataloader.py", line 386, in _get_iterator
return _MultiProcessingDataLoaderIter(self)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "E:\Some_Directory\PyCharm\ISLP\Lib\site-packages\torch\utils\data\dataloader.py", line 1039, in __init__
w.start()
File "C:\Users\User\AppData\Local\Programs\Python\Python311\Lib\multiprocessing\process.py", line 121, in start
self._popen = self._Popen(self)
^^^^^^^^^^^^^^^^^
File "C:\Users\User\AppData\Local\Programs\Python\Python311\Lib\multiprocessing\context.py", line 224, in _Popen
return _default_context.get_context().Process._Popen(process_obj)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "C:\Users\User\AppData\Local\Programs\Python\Python311\Lib\multiprocessing\context.py", line 336, in _Popen
return Popen(process_obj)
^^^^^^^^^^^^^^^^^^
File "C:\Users\User\AppData\Local\Programs\Python\Python311\Lib\multiprocessing\popen_spawn_win32.py", line 45, in __init__
prep_data = spawn.get_preparation_data(process_obj._name)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "C:\Users\User\AppData\Local\Programs\Python\Python311\Lib\multiprocessing\spawn.py", line 158, in get_preparation_data
_check_not_importing_main()
File "C:\Users\User\AppData\Local\Programs\Python\Python311\Lib\multiprocessing\spawn.py", line 138, in _check_not_importing_main
raise RuntimeError('''
RuntimeError:
An attempt has been made to start a new process before the
current process has finished its bootstrapping phase.
This probably means that you are not using fork to start your
child processes and you have forgotten to use the proper idiom
in the main module:
if __name__ == '__main__':
freeze_support()
...
The "freeze_support()" line can be omitted if the program
is not going to be frozen to produce an executable.
Traceback (most recent call last):
File "E:\Some_Directory\PyCharm\ISLP\Lib\site-packages\torch\utils\data\dataloader.py", line 1132, in _try_get_data
data = self._data_queue.get(timeout=timeout)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "C:\Users\User\AppData\Local\Programs\Python\Python311\Lib\multiprocessing\queues.py", line 114, in get
raise Empty
_queue.Empty
The above exception was the direct cause of the following exception:
Traceback (most recent call last):
File "E:\Some_Directory\PyCharm\Projekt\ISLP\statlearning\NN\nn1.py", line 148, in <module>
hit_trainer.fit(hit_module, datamodule=hit_dm)
File "E:\Some_Directory\PyCharm\ISLP\Lib\site-packages\pytorch_lightning\trainer\trainer.py", line 545, in fit
call._call_and_handle_interrupt(
File "E:\Some_Directory\PyCharm\ISLP\Lib\site-packages\pytorch_lightning\trainer\call.py", line 44, in _call_and_handle_interrupt
return trainer_fn(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "E:\Some_Directory\PyCharm\ISLP\Lib\site-packages\pytorch_lightning\trainer\trainer.py", line 581, in _fit_impl
self._run(model, ckpt_path=ckpt_path)
File "E:\Some_Directory\PyCharm\ISLP\Lib\site-packages\pytorch_lightning\trainer\trainer.py", line 990, in _run
results = self._run_stage()
^^^^^^^^^^^^^^^^^
File "E:\Some_Directory\PyCharm\ISLP\Lib\site-packages\pytorch_lightning\trainer\trainer.py", line 1034, in _run_stage
self._run_sanity_check()
File "E:\Some_Directory\PyCharm\ISLP\Lib\site-packages\pytorch_lightning\trainer\trainer.py", line 1063, in _run_sanity_check
val_loop.run()
File "E:\Some_Directory\PyCharm\ISLP\Lib\site-packages\pytorch_lightning\loops\utilities.py", line 181, in _decorator
return loop_run(self, *args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "E:\Some_Directory\PyCharm\ISLP\Lib\site-packages\pytorch_lightning\loops\evaluation_loop.py", line 127, in run
batch, batch_idx, dataloader_idx = next(data_fetcher)
^^^^^^^^^^^^^^^^^^
File "E:\Some_Directory\PyCharm\ISLP\Lib\site-packages\pytorch_lightning\loops\fetchers.py", line 127, in __next__
batch = super().__next__()
^^^^^^^^^^^^^^^^^^
File "E:\Some_Directory\PyCharm\ISLP\Lib\site-packages\pytorch_lightning\loops\fetchers.py", line 56, in __next__
batch = next(self.iterator)
^^^^^^^^^^^^^^^^^^^
File "E:\Some_Directory\PyCharm\ISLP\Lib\site-packages\pytorch_lightning\utilities\combined_loader.py", line 326, in __next__
out = next(self._iterator)
^^^^^^^^^^^^^^^^^^^^
File "E:\Some_Directory\PyCharm\ISLP\Lib\site-packages\pytorch_lightning\utilities\combined_loader.py", line 132, in __next__
out = next(self.iterators[0])
^^^^^^^^^^^^^^^^^^^^^^^
File "E:\Some_Directory\PyCharm\ISLP\Lib\site-packages\torch\utils\data\dataloader.py", line 630, in __next__
data = self._next_data()
^^^^^^^^^^^^^^^^^
File "E:\Some_Directory\PyCharm\ISLP\Lib\site-packages\torch\utils\data\dataloader.py", line 1328, in _next_data
idx, data = self._get_data()
^^^^^^^^^^^^^^^^
File "E:\Some_Directory\PyCharm\ISLP\Lib\site-packages\torch\utils\data\dataloader.py", line 1294, in _get_data
success, data = self._try_get_data()
^^^^^^^^^^^^^^^^^^^^
File "E:\Some_Directory\PyCharm\ISLP\Lib\site-packages\torch\utils\data\dataloader.py", line 1145, in _try_get_data
raise RuntimeError(f'DataLoader worker (pid(s) {pids_str}) exited unexpectedly') from e
RuntimeError: DataLoader worker (pid(s) 4144, 22000, 13920, 7696) exited unexpectedly`
When I try to repeat the lab from Chapter 10 about Neural Networks, Hitters dataset. I run into:
So I assume it has something to do with multiprocessing, data_loader but I'm total newbie here.