sustainable-processes / summit

Optimising chemical reactions using machine learning
https://gosummit.readthedocs.io/en/latest/
MIT License
118 stars 24 forks source link

run.py avoid writing to disk by default #225

Closed ilario closed 1 year ago

ilario commented 1 year ago

Description

The default location for saving stuff is not writable, on my system. And the default behavior is to save stuff on disk. In my opinion, by default, no stuff should be saved, neither temporarily, unless it is actually unavoidable.

This causes this error on my system:

PermissionError                           Traceback (most recent call last)
Cell In[37], line 4
      2 rNMadaptive_20220301 = Runner(strategy=strategyNMadaptive, experiment=emul_constr_20220301, max_iterations=20)
      3 rNMadaptive_20220301.reset()
----> 4 rNMadaptive_20220301.run(prev_res=starting_points_20220301)

File /home/ilario/.venv/lib/python3.10/site-packages/summit/run.py:145, in Runner.run(self, **kwargs)
    143 save_dir = pathlib.Path(save_dir) / "runner" / str(self.uuid_val)
    144 if not os.path.isdir(save_dir):
--> 145     os.makedirs(save_dir)
    146 save_at_end = kwargs.get("save_at_end", True)
    148 n_objs = len(self.experiment.domain.output_variables)

File /usr/lib/python3.10/os.py:215, in makedirs(name, mode, exist_ok)
    213 if head and tail and not path.exists(head):
    214     try:
--> 215         makedirs(head, exist_ok=exist_ok)
    216     except FileExistsError:
    217         # Defeats race condition when another thread created the path
    218         pass

File /usr/lib/python3.10/os.py:215, in makedirs(name, mode, exist_ok)
    213 if head and tail and not path.exists(head):
...
    227     # Cannot rely on checking for EEXIST, since the operating system
    228     # could give priority to other errors like EACCES or EROFS
    229     if not exist_ok or not path.isdir(name):

PermissionError: [Errno 13] Permission denied: '/usr/bin/.summit'

A possible solution is to specify the save_dir path in the run command like:

rNM_20220301.run(prev_res=starting_points_20220301, save_dir="~")

But what I would expect is that if save_freq is unset it does not save any file at all, rather than creating it and deleting it.

In order to do so, the lines to be changed are these ones:

https://github.com/sustainable-processes/summit/blob/a5e4d0ca848bf3b4c794f2975bc1790e21b0343d/summit/run.py#L144-L145

https://github.com/sustainable-processes/summit/blob/a5e4d0ca848bf3b4c794f2975bc1790e21b0343d/summit/run.py#L178-L181

https://github.com/sustainable-processes/summit/blob/a5e4d0ca848bf3b4c794f2975bc1790e21b0343d/summit/run.py#L202-L204

https://github.com/sustainable-processes/summit/blob/a5e4d0ca848bf3b4c794f2975bc1790e21b0343d/summit/run.py#L343-L344

https://github.com/sustainable-processes/summit/blob/a5e4d0ca848bf3b4c794f2975bc1790e21b0343d/summit/run.py#L402-L408

https://github.com/sustainable-processes/summit/blob/a5e4d0ca848bf3b4c794f2975bc1790e21b0343d/summit/run.py#L430-L435

What I Did

python -m venv .venv     
source .venv/bin/activate
pip install -U summit

Open VSCodium and indicate to use .venv as an environment.

Run some summit stuff.

marcosfelt commented 1 year ago

This is definitely a bug.

The temporary solution to is to specify the save_dir:

rNMadaptive_20220301 = Runner(save_dir=".summit", strategy=strategyNMadaptive, experiment=emul_constr_20220301, max_iterations=20)

I think the correct solution would be to change the Runner.run method default from saving to not saving and more clearly explain what the default saving folder is. Does that make sense? https://github.com/sustainable-processes/summit/blob/a5e4d0ca848bf3b4c794f2975bc1790e21b0343d/summit/run.py#L134-L136

ilario commented 1 year ago

I think the correct solution would be to change the Runner.run method default from saving to not saving

Yes, I agree this would help :)

Additionally to that, a check should be added here: https://github.com/sustainable-processes/summit/blob/a5e4d0ca848bf3b4c794f2975bc1790e21b0343d/summit/run.py#L144-L145 and here: https://github.com/sustainable-processes/summit/blob/a5e4d0ca848bf3b4c794f2975bc1790e21b0343d/summit/run.py#L343-L344

But I am not sure about what should be checked, the presence of save_dir?

marcosfelt commented 1 year ago

Hi @ilario! Do you mind checking out this PR and seeing if it works for you?

git clone https://github.com/sustainable-processes/summit.git
git fetch origin pull/227/head:fix_runner
git checkout fix_runner
python3 -m venv .venv
source .venv/bin/activate
pip install -e .
ilario commented 1 year ago

Tested and it works for me (I did test only with and without save_dir, I did not test each possible case) :)

When I installed it with -e I got this error:

$ pip install -e .
Obtaining file:///home/ilario/software/summit
  Installing build dependencies ... done
  Checking if build backend supports build_editable ... done
ERROR: Project file:///home/ilario/software/summit has a 'pyproject.toml' and its build backend is missing the 'build_editable' hook. Since it does not have a 'setup.py' nor a 'setup.cfg', it cannot be installed in editable mode. Consider using a build backend that supports PEP 660.

but when installing it without -e it just worked :)

marcosfelt commented 1 year ago

Great! I probably should have written more test cases but it passes all the current ones, so I'm happy with that for now.