intro-stat-learning / ISLP_labs

Up-to-date version of labs for ISLP
BSD 2-Clause "Simplified" License
660 stars 388 forks source link

RunTimeErrors #20

Closed Mersion closed 10 months ago

Mersion commented 10 months ago

When I try to repeat the lab from Chapter 10 about Neural Networks, Hitters dataset. I run into:

RuntimeError: 
        An attempt has been made to start a new process before the
        current process has finished its bootstrapping phase.

        This probably means that you are not using fork to start your
        child processes and you have forgotten to use the proper idiom
        in the main module:

            if __name__ == '__main__':
                freeze_support()
                ...

RuntimeError: DataLoader worker (pid(s) 7156, 796, 15192, 304) exited unexpectedly.

So I assume it has something to do with multiprocessing, data_loader but I'm total newbie here.

jonathan-taylor commented 10 months ago

Did you have a fuller traceback? ISLP does not import multiprocessing or manage any processes for DataLoader (which is a torch construct).

I suspect this is something to do with torch.

Are you using the versions specified here: https://github.com/intro-stat-learning/ISLP_labs/tree/stable ?


From: Seiran @.> Sent: Saturday, October 28, 2023 1:08 PM To: intro-stat-learning/ISLP_labs @.> Cc: Subscribed @.***> Subject: [intro-stat-learning/ISLP_labs] RunTimeErrors (Issue #20)

When I try to repeat the lab from Chapter 10 about Neural Networks, Hitters dataset. I run into: RuntimeError: An attempt has been made to start a new process before the current process has finished its bootstrapping phase.

This probably means that you are not using fork to start your
child processes and you have forgotten to use the proper idiom
in the main module:

    if __name__ == '__main__':
        freeze_support()
        ...

RuntimeError: DataLoader worker (pid(s) 7156, 796, 15192, 304) exited unexpectedly.

So I assume it has something to do with multiprocessing, data_loader but I'm total newbie here.

— Reply to this email directly, view it on GitHubhttps://github.com/intro-stat-learning/ISLP_labs/issues/20, or unsubscribehttps://github.com/notifications/unsubscribe-auth/AACTM2YFWULSW3HT6HELRETYBVQ25AVCNFSM6AAAAAA6UJVGY6VHI2DSMVQWIX3LMV43ASLTON2WKOZRHE3DMNZSGU2TIMY. You are receiving this because you are subscribed to this thread.Message ID: @.***>

Mersion commented 10 months ago

Actually I think I fixed it. I added this line after imports, because it said something about deterministic behaviour: os.environ["CUBLAS_WORKSPACE_CONFIG"]=":16:8" but it's not enough for it to work, cause the other code is needed as well. I just put this __name__="__main__" just before loading this SimpleDataModule, but it has to include evertyhing till the end? I don't know but it works this way:

if __name__ == "__main__":
    hit_dm = SimpleDataModule(hit_train, hit_test, batch_size=32, num_workers=min(4, max_num_workers),
                              validation=hit_test
                              )
    hit_module = SimpleModule.regression(hit_model, metrics={'mae': MeanAbsoluteError()})
    hit_logger = CSVLogger('logs', name='hitters')

    hit_trainer = Trainer(deterministic=True,
                          max_epochs=50,
                          log_every_n_steps=5,
                          logger=hit_logger,
                          callbacks=[ErrorTracker()])
    hit_trainer.fit(hit_module, datamodule=hit_dm)
    hit_trainer.test(hit_module, datamodule=hit_dm)
    hit_results = pd.read_csv(hit_logger.experiment.metrics_file_path)

    fig, ax = subplots(1, 1, figsize=(6, 6))
    ax = summary_plot(hit_results, ax, col='mae', ylabel='MAE', valid_legend='Validation (=Test)')
    ax.set_ylim([0, 400])
    ax.set_xticks(np.linspace(0, 50, 11).astype(int))
    hit_model.eval()
    preds = hit_module(X_test_t)
    torch.abs(Y_test_t - preds).mean()
    plt.show()
    del (Hitters, hit_model, hit_dm, hit_logger, hit_test, hit_train, X, Y, X_test, X_train, Y_test, Y_train, X_test_t,
         Y_test_t, hit_trainer, hit_module)

But thanks for fast reply. And I Thank you as well for making this book!

jonathan-taylor commented 10 months ago

Hmm... I'm not really sure of the cause here. Do you have any special hardware (related to CUBLAS)?

Is your "name == 'main'" in a notebook or a regular python script?


From: Seiran @.> Sent: Saturday, October 28, 2023 2:24 PM To: intro-stat-learning/ISLP_labs @.> Cc: Jonathan Taylor @.>; Comment @.> Subject: Re: [intro-stat-learning/ISLP_labs] RunTimeErrors (Issue #20)

Actually I think I fixed it. I added this line after imports, because it said something about deterministic behaviour: os.environ["CUBLAS_WORKSPACE_CONFIG"]=":16:8" but it's not enough for it to work, cause the other code is needed as well. I just put this name="main" just before loading this SimpleDataModule, but it has to include evertyhing till the end? I don't know but it works this way:

if name == "main": hit_dm = SimpleDataModule(hit_train, hit_test, batch_size=32, num_workers=min(4, max_num_workers), validation=hit_test ) hit_module = SimpleModule.regression(hit_model, metrics={'mae': MeanAbsoluteError()}) hit_logger = CSVLogger('logs', name='hitters')

hit_trainer = Trainer(deterministic=True,
                      max_epochs=50,
                      log_every_n_steps=5,
                      logger=hit_logger,
                      callbacks=[ErrorTracker()])
hit_trainer.fit(hit_module, datamodule=hit_dm)
hit_trainer.test(hit_module, datamodule=hit_dm)
hit_results = pd.read_csv(hit_logger.experiment.metrics_file_path)

fig, ax = subplots(1, 1, figsize=(6, 6))
ax = summary_plot(hit_results, ax, col='mae', ylabel='MAE', valid_legend='Validation (=Test)')
ax.set_ylim([0, 400])
ax.set_xticks(np.linspace(0, 50, 11).astype(int))
hit_model.eval()
preds = hit_module(X_test_t)
torch.abs(Y_test_t - preds).mean()
plt.show()
del (Hitters, hit_model, hit_dm, hit_logger, hit_test, hit_train, X, Y, X_test, X_train, Y_test, Y_train, X_test_t,
     Y_test_t, hit_trainer, hit_module)

But thanks for fast reply. And I Thank you as well for making this book!

— Reply to this email directly, view it on GitHubhttps://github.com/intro-stat-learning/ISLP_labs/issues/20#issuecomment-1783924011, or unsubscribehttps://github.com/notifications/unsubscribe-auth/AACTM22QJKVVRFZRPZZEIETYBVZYPAVCNFSM6AAAAAA6UJVGY6VHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTOOBTHEZDIMBRGE. You are receiving this because you commented.Message ID: @.***>

Mersion commented 10 months ago

Regular python script, but I run it in pycharm on a laptop with Nvidia Geforce rtx 3050ti. If I was to delete this environmental variable I would get: RuntimeError: Deterministic behavior was enabled with eithertorch.use_deterministic_algorithms(True)orat::Context::setDeterministicAlgorithms(true), but this operation is not deterministic because it uses CuBLAS and you have CUDA >= 10.2. To enable deterministic behavior in this case, you must set an environment variable before running your PyTorch application: CUBLAS_WORKSPACE_CONFIG=:4096:8 or CUBLAS_WORKSPACE_CONFIG=:16:8. For more information, go to https://docs.nvidia.com/cuda/cublas/index.html#cublasApi_reproducibility

Mersion commented 10 months ago

This is previous, almost full, cause it's repeats in loops, error trace, before adding "__name__ == '__main__'.

`Sanity Checking: |          | 0/? [00:00<?, ?it/s]Traceback (most recent call last):
  File "<string>", line 1, in <module>
  File "C:\Users\User\AppData\Local\Programs\Python\Python311\Lib\multiprocessing\spawn.py", line 120, in spawn_main
    exitcode = _main(fd, parent_sentinel)
               ^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\Users\User\AppData\Local\Programs\Python\Python311\Lib\multiprocessing\spawn.py", line 129, in _main
    prepare(preparation_data)
  File "C:\Users\User\AppData\Local\Programs\Python\Python311\Lib\multiprocessing\spawn.py", line 240, in prepare
    _fixup_main_from_path(data['init_main_from_path'])
  File "C:\Users\User\AppData\Local\Programs\Python\Python311\Lib\multiprocessing\spawn.py", line 291, in _fixup_main_from_path
    main_content = runpy.run_path(main_path,
                   ^^^^^^^^^^^^^^^^^^^^^^^^^
  File "<frozen runpy>", line 291, in run_path
  File "<frozen runpy>", line 98, in _run_module_code
  File "<frozen runpy>", line 88, in _run_code
  File "E:\Some_Directory\PyCharm\Projekt\ISLP\statlearning\NN\nn1.py", line 148, in <module>
    hit_trainer.fit(hit_module, datamodule=hit_dm)
  File "E:\Some_Directory\PyCharm\ISLP\Lib\site-packages\pytorch_lightning\trainer\trainer.py", line 545, in fit
    call._call_and_handle_interrupt(
  File "E:\Some_Directory\PyCharm\ISLP\Lib\site-packages\pytorch_lightning\trainer\call.py", line 44, in _call_and_handle_interrupt
    return trainer_fn(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "E:\Some_Directory\PyCharm\ISLP\Lib\site-packages\pytorch_lightning\trainer\trainer.py", line 581, in _fit_impl
    self._run(model, ckpt_path=ckpt_path)
  File "E:\Some_Directory\PyCharm\ISLP\Lib\site-packages\pytorch_lightning\trainer\trainer.py", line 990, in _run
    results = self._run_stage()
              ^^^^^^^^^^^^^^^^^
  File "E:\Some_Directory\PyCharm\ISLP\Lib\site-packages\pytorch_lightning\trainer\trainer.py", line 1034, in _run_stage
    self._run_sanity_check()
  File "E:\Some_Directory\PyCharm\ISLP\Lib\site-packages\pytorch_lightning\trainer\trainer.py", line 1063, in _run_sanity_check
    val_loop.run()
  File "E:\Some_Directory\PyCharm\ISLP\Lib\site-packages\pytorch_lightning\loops\utilities.py", line 181, in _decorator
    return loop_run(self, *args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "E:\Some_Directory\PyCharm\ISLP\Lib\site-packages\pytorch_lightning\loops\evaluation_loop.py", line 112, in run
    self.reset()
  File "E:\Some_Directory\PyCharm\ISLP\Lib\site-packages\pytorch_lightning\loops\evaluation_loop.py", line 230, in reset
    iter(data_fetcher)  # creates the iterator inside the fetcher
    ^^^^^^^^^^^^^^^^^^
  File "E:\Some_Directory\PyCharm\ISLP\Lib\site-packages\pytorch_lightning\loops\fetchers.py", line 99, in __iter__
    super().__iter__()
  File "E:\Some_Directory\PyCharm\ISLP\Lib\site-packages\pytorch_lightning\loops\fetchers.py", line 48, in __iter__
    self.iterator = iter(self.combined_loader)
                    ^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "E:\Some_Directory\PyCharm\ISLP\Lib\site-packages\pytorch_lightning\utilities\combined_loader.py", line 335, in __iter__
    iter(iterator)
  File "E:\Some_Directory\PyCharm\ISLP\Lib\site-packages\pytorch_lightning\utilities\combined_loader.py", line 144, in __iter__
    self._load_current_iterator()
  File "E:\Some_Directory\PyCharm\ISLP\Lib\site-packages\pytorch_lightning\utilities\combined_loader.py", line 160, in _load_current_iterator
    self.iterators = [iter(self.iterables[self._iterator_idx])]
                      ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "E:\Some_Directory\PyCharm\ISLP\Lib\site-packages\torch\utils\data\dataloader.py", line 433, in __iter__
    self._iterator = self._get_iterator()
                     ^^^^^^^^^^^^^^^^^^^^
  File "E:\Some_Directory\PyCharm\ISLP\Lib\site-packages\torch\utils\data\dataloader.py", line 386, in _get_iterator
    return _MultiProcessingDataLoaderIter(self)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "E:\Some_Directory\PyCharm\ISLP\Lib\site-packages\torch\utils\data\dataloader.py", line 1039, in __init__
    w.start()
  File "C:\Users\User\AppData\Local\Programs\Python\Python311\Lib\multiprocessing\process.py", line 121, in start
    self._popen = self._Popen(self)
                  ^^^^^^^^^^^^^^^^^
  File "C:\Users\User\AppData\Local\Programs\Python\Python311\Lib\multiprocessing\context.py", line 224, in _Popen
    return _default_context.get_context().Process._Popen(process_obj)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\Users\User\AppData\Local\Programs\Python\Python311\Lib\multiprocessing\context.py", line 336, in _Popen
    return Popen(process_obj)
           ^^^^^^^^^^^^^^^^^^
  File "C:\Users\User\AppData\Local\Programs\Python\Python311\Lib\multiprocessing\popen_spawn_win32.py", line 45, in __init__
    prep_data = spawn.get_preparation_data(process_obj._name)
                ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\Users\User\AppData\Local\Programs\Python\Python311\Lib\multiprocessing\spawn.py", line 158, in get_preparation_data
    _check_not_importing_main()
  File "C:\Users\User\AppData\Local\Programs\Python\Python311\Lib\multiprocessing\spawn.py", line 138, in _check_not_importing_main
    raise RuntimeError('''
RuntimeError: 
        An attempt has been made to start a new process before the
        current process has finished its bootstrapping phase.

        This probably means that you are not using fork to start your
        child processes and you have forgotten to use the proper idiom
        in the main module:

            if __name__ == '__main__':
                freeze_support()
                ...

        The "freeze_support()" line can be omitted if the program
        is not going to be frozen to produce an executable.
Traceback (most recent call last):
  File "E:\Some_Directory\PyCharm\ISLP\Lib\site-packages\torch\utils\data\dataloader.py", line 1132, in _try_get_data
    data = self._data_queue.get(timeout=timeout)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\Users\User\AppData\Local\Programs\Python\Python311\Lib\multiprocessing\queues.py", line 114, in get
    raise Empty
_queue.Empty

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "E:\Some_Directory\PyCharm\Projekt\ISLP\statlearning\NN\nn1.py", line 148, in <module>
    hit_trainer.fit(hit_module, datamodule=hit_dm)
  File "E:\Some_Directory\PyCharm\ISLP\Lib\site-packages\pytorch_lightning\trainer\trainer.py", line 545, in fit
    call._call_and_handle_interrupt(
  File "E:\Some_Directory\PyCharm\ISLP\Lib\site-packages\pytorch_lightning\trainer\call.py", line 44, in _call_and_handle_interrupt
    return trainer_fn(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "E:\Some_Directory\PyCharm\ISLP\Lib\site-packages\pytorch_lightning\trainer\trainer.py", line 581, in _fit_impl
    self._run(model, ckpt_path=ckpt_path)
  File "E:\Some_Directory\PyCharm\ISLP\Lib\site-packages\pytorch_lightning\trainer\trainer.py", line 990, in _run
    results = self._run_stage()
              ^^^^^^^^^^^^^^^^^
  File "E:\Some_Directory\PyCharm\ISLP\Lib\site-packages\pytorch_lightning\trainer\trainer.py", line 1034, in _run_stage
    self._run_sanity_check()
  File "E:\Some_Directory\PyCharm\ISLP\Lib\site-packages\pytorch_lightning\trainer\trainer.py", line 1063, in _run_sanity_check
    val_loop.run()
  File "E:\Some_Directory\PyCharm\ISLP\Lib\site-packages\pytorch_lightning\loops\utilities.py", line 181, in _decorator
    return loop_run(self, *args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "E:\Some_Directory\PyCharm\ISLP\Lib\site-packages\pytorch_lightning\loops\evaluation_loop.py", line 127, in run
    batch, batch_idx, dataloader_idx = next(data_fetcher)
                                       ^^^^^^^^^^^^^^^^^^
  File "E:\Some_Directory\PyCharm\ISLP\Lib\site-packages\pytorch_lightning\loops\fetchers.py", line 127, in __next__
    batch = super().__next__()
            ^^^^^^^^^^^^^^^^^^
  File "E:\Some_Directory\PyCharm\ISLP\Lib\site-packages\pytorch_lightning\loops\fetchers.py", line 56, in __next__
    batch = next(self.iterator)
            ^^^^^^^^^^^^^^^^^^^
  File "E:\Some_Directory\PyCharm\ISLP\Lib\site-packages\pytorch_lightning\utilities\combined_loader.py", line 326, in __next__
    out = next(self._iterator)
          ^^^^^^^^^^^^^^^^^^^^
  File "E:\Some_Directory\PyCharm\ISLP\Lib\site-packages\pytorch_lightning\utilities\combined_loader.py", line 132, in __next__
    out = next(self.iterators[0])
          ^^^^^^^^^^^^^^^^^^^^^^^
  File "E:\Some_Directory\PyCharm\ISLP\Lib\site-packages\torch\utils\data\dataloader.py", line 630, in __next__
    data = self._next_data()
           ^^^^^^^^^^^^^^^^^
  File "E:\Some_Directory\PyCharm\ISLP\Lib\site-packages\torch\utils\data\dataloader.py", line 1328, in _next_data
    idx, data = self._get_data()
                ^^^^^^^^^^^^^^^^
  File "E:\Some_Directory\PyCharm\ISLP\Lib\site-packages\torch\utils\data\dataloader.py", line 1294, in _get_data
    success, data = self._try_get_data()
                    ^^^^^^^^^^^^^^^^^^^^
  File "E:\Some_Directory\PyCharm\ISLP\Lib\site-packages\torch\utils\data\dataloader.py", line 1145, in _try_get_data
    raise RuntimeError(f'DataLoader worker (pid(s) {pids_str}) exited unexpectedly') from e
RuntimeError: DataLoader worker (pid(s) 4144, 22000, 13920, 7696) exited unexpectedly`