naszilla / tabzilla

Apache License 2.0
123 stars 28 forks source link

VIME memory error (OSError) #40

Closed duncanmcelfresh closed 1 year ago

duncanmcelfresh commented 2 years ago

This error occurred with alg VIME on dataset openml__dionis__189355, this occurs on roughly half of all hparam samples tested:

Traceback (most recent call last):
  File "/home/shared/tabzilla/TabSurvey/tabzilla_experiment.py", line 136, in __call__
    result = cross_validation(model, self.dataset, self.time_limit)
  File "/home/shared/tabzilla/TabSurvey/tabzilla_utils.py", line 237, in cross_validation
    loss_history, val_loss_history = curr_model.fit(
  File "/home/shared/tabzilla/TabSurvey/models/vime.py", line 47, in fit
    self.fit_self(X_unlab, p_m=self.params["p_m"], alpha=self.params["alpha"])
  File "/home/shared/tabzilla/TabSurvey/models/vime.py", line 148, in fit_self
    for batch_X, batch_mask, batch_feat in train_loader:
  File "/opt/conda/envs/torch/lib/python3.10/site-packages/torch/utils/data/dataloader.py", line 368, in __iter__
    return self._get_iterator()
  File "/opt/conda/envs/torch/lib/python3.10/site-packages/torch/utils/data/dataloader.py", line 314, in _get_iterator
    return _MultiProcessingDataLoaderIter(self)
  File "/opt/conda/envs/torch/lib/python3.10/site-packages/torch/utils/data/dataloader.py", line 927, in __init__
    w.start()
  File "/opt/conda/envs/torch/lib/python3.10/multiprocessing/process.py", line 121, in start
    self._popen = self._Popen(self)
  File "/opt/conda/envs/torch/lib/python3.10/multiprocessing/context.py", line 224, in _Popen
    return _default_context.get_context().Process._Popen(process_obj)
  File "/opt/conda/envs/torch/lib/python3.10/multiprocessing/context.py", line 277, in _Popen
    return Popen(process_obj)
  File "/opt/conda/envs/torch/lib/python3.10/multiprocessing/popen_fork.py", line 19, in __init__
    self._launch(process_obj)
  File "/opt/conda/envs/torch/lib/python3.10/multiprocessing/popen_fork.py", line 66, in _launch
    self.pid = os.fork()
OSError: [Errno 12] Cannot allocate memory
duncanmcelfresh commented 1 year ago

this is a memory error, which we will ignore for these experiments