jansel / opentuner

An extensible framework for program autotuning
http://opentuner.org/
MIT License
392 stars 114 forks source link

checkpoint and restart #161

Open TimHuDi opened 10 months ago

TimHuDi commented 10 months ago

Hi Jansel: I use opentuner to tune long runing applications, I find that the search algorithms need many iterations to get a better results. Sometimes we need to restart the computer, and when we restart the opentuner, we are almost start from the beginning. Can we impletement the checkpoint, and save the status of opentuner to some file, and when we restart, we could restart from the suspended iteration. Do you have any comment about this? Thanks.

jansel commented 10 months ago

Full checkpointing is a bit hard because each search technique has different state.

I'd recommend either using this flag:

  --seed-configuration FILENAME
                        Start search at a given configuration. Can be specified multiple times. Configurations are loaded with ConfigurationManipulator.load_from_file() and file format is detected from extension.

Or the API version of that in measurement_interface.seed_configurations().

Pull requests are welcome to add a more automatic checkpointing feature. The easy thing to do would be to populate measurement_interface.seed_configurations() based on a database query of the best N configs from a prior run. You could also try to checkpoint state inside the search techniques.

TimHuDi commented 10 months ago

Hi Jansel: Thanks for your reply, I have used seed_configuration, it's effective, but not effective enough, because opentuner not checkpoint state as you say. And, if I want to save the state of UniformGreedyMutation, what states need to save. My question is a little naive, because I lack the expertise of the search algorithms, and I am interested in this, could you recommand some material to learn genetic algoritms? I use criu(https://github.com/checkpoint-restore/criu) to checkpoint/restart python, and it works, but it is a tool which rely on Linux kernel, which have some limitations.Maybe a python tool is more suitable, and I find your paper about dmtcp(DMTCP: Transparent checkpointing for cluster computations and the desktop), do you think DMTCP could be used to checkpoint opentuner?

jansel commented 10 months ago

UniformGreedyMutation doesn't have any state, so just setting seed_configuration should be the same as checkpointing it. UniformGreedyMutation just takes the current best config, randomly mutates it, and repeats. It is the other algorithms which have things like populations, step sizes, etc.

Generic checkpoint tools like DMTCP may also work. Or even something simple like pickle if you manually recreate the database connection.

TimHuDi commented 10 months ago
  Yes, I find UniformGreedyMutation is random algorithm. I am curious how to compare the UniformGreedyMutation05/10/20 to find the suitable serch technique for a specific problem. The experiment is not reproducible, so its hard to say which is better. 
  Maybe I find checkpointing it is better by chance, but in theory it's the same as setting seed_configuration.  Thanks for your time and patience.