xadrianzetx / optuna-distributed

Distributed hyperparameter optimization made easy
MIT License
34 stars 1 forks source link

Too many open files #89

Closed ggoupy closed 5 months ago

ggoupy commented 5 months ago

I am using optuna_distributed with optuna.storages.JournalStorage and after ~480 trials i get: OSError: [Errno 24] Too many open files.

  Number of trials:  479 // All
  Number of complete trials:  419 // optuna.trial.TrialState.COMPLETE

Output of ulimit:

open files                      (-n) 1024

I am wondering if each trial properly closes the logging file after being completed.

Here is a snippet of my code:

    storage = optuna.storages.JournalStorage(
        optuna.storages.JournalFileStorage(f"{args.output_dir}/{config_name}.log"),
    )

    study = optuna_distributed.from_study(
        optuna.create_study(study_name=config_name, direction="maximize", storage=storage, sampler=sampler, load_if_exists=True)
    )

    study.optimize(
        lambda trial: objective(trial=trial, dataset_dir=args.input_dir, search_config=search_config, seed=0), 
        n_trials=int(args.n_trials), 
        n_jobs=int(args.n_proc)
    )

Here is the full error message:


    study.optimize(
  File "/home/ggoupy/.local/lib/python3.8/site-packages/optuna_distributed/study.py", line 192, in optimize
    event_loop.run(terminal, timeout, catch)
  File "/home/ggoupy/.local/lib/python3.8/site-packages/optuna_distributed/eventloop.py", line 65, in run
    self.manager.after_message(self)
  File "/home/ggoupy/.local/lib/python3.8/site-packages/optuna_distributed/managers/local.py", line 92, in after_message
    self.create_futures(event_loop.study, event_loop.objective)
  File "/home/ggoupy/.local/lib/python3.8/site-packages/optuna_distributed/managers/local.py", line 61, in create_futures
    p.start()
  File "/usr/lib/python3.8/multiprocessing/process.py", line 121, in start
    self._popen = self._Popen(self)
  File "/usr/lib/python3.8/multiprocessing/context.py", line 224, in _Popen
    return _default_context.get_context().Process._Popen(process_obj)
  File "/usr/lib/python3.8/multiprocessing/context.py", line 277, in _Popen
    return Popen(process_obj)
  File "/usr/lib/python3.8/multiprocessing/popen_fork.py", line 19, in __init__
    self._launch(process_obj)
  File "/usr/lib/python3.8/multiprocessing/popen_fork.py", line 69, in _launch
    child_r, parent_w = os.pipe()
OSError: [Errno 24] Too many open files
xadrianzetx commented 5 months ago

Hi, thanks for reporting this!

This is most likely related to a bug in my process management logic, not the storage you're using. I will investigate it further (hopefully) next week and let you know once the patch is out.

ggoupy commented 5 months ago

Thanks :+1:

Alternatively, increasing the limit seemed to circumvent the issue:

import resource
soft, hard = resource.getrlimit(resource.RLIMIT_NOFILE)
resource.setrlimit(resource.RLIMIT_NOFILE, (8192, hard))
xadrianzetx commented 5 months ago

Hi @ggoupy, this issue has been fixed in the latest release. You can install it by running pip install -U optuna-distributed.

ggoupy commented 5 months ago

Thanks for the quick fix 🙏