lagom / executor / driver refactor

amacati commented 3 years ago

Refactor the experiment function, executors and drivers.

This PR seeks to disentangle hyperparameter optimization, ablation studies and distributed training code. With the addition of more features to Maggy the code becomes confusing if it is not split into distinct units. As a first step, the lagom function and associated functionality was rewritten. Since the implications of changing the lagom function concern the whole module, the changes are substantial. In order to make changes retraceable, I give a short summary of each file with the respective changes. It will be easiest to check the files of the PR in the order they are given in this summary.

Disentangling the lagom function:

maggy/experiment.py: Experiment now acts as a simple wrapper around the different lagom functionalities. It sets up basic info such as the IDs and still contains the error handling. Now takes a config object instead of the previous parameters since some keywords make no sense in the distributed setting, but are necessary for optimization.
maggy/experiment_config.py: The config classes for lagom.
maggy/core/lagom/: Contains lagom_optimization.py, lagom_ablation.py and lagom_distributed.py. These are the target functions for the dispatcher in experiment.py. Splitting the code introduced some redundancy, but I'd rather have slightly similar scripts than a large, entangled single script. Might be able to reduce redundancies further in future versions.
maggy/distributed/: Since experiment now dispatches for all modes, the special treatment of distributed is no longer necessary and the folder was deleted.
maggy/core/executors/Executor.py: The Executor class previously acted as a dispatcher for the kind of monkey-patching that was to be performed on the training function. Since dispatching has been delegated to experiment.py, the Executor class is no longer necessary.
maggy/core/executors/dist_executor.py: Adaption to the use of EnvSing() introduced by #76 and minor changes.
maggy/core/executors/trial_executor.py: Minor changes to keep consistency with dist_executor.py.

Refactoring the drivers

maggy/core/experiment_driver/Driver.py: The base class driver previously contained all the message digestion logic for hyperparameter training as well as several hp tuning specific functions. The new driver instead lets child classes register callbacks for the message types they want to support and starts this general message digestion thread. Also switched to config based initialization instead of keyword arguments.
maggy/core/experiment_driver/OptimizationDriver.py: Moved most of the message digestion logic into optimization driver. Now defines and registers a callback for each message type for the digestion thread. Also moved several functions specific to hp tuning to the OptimizationDriver. Switched to config initialization.
maggy/core/experiment_driver/AblationDriver.py: Switched to config initialization. Simplified initialization logic and switched to config.
maggy/core/experiment_driver/DistributedDriver.py: Removed all code that was necessary to maintain compatibility with the previous version of the base driver. Adapted callback schema and config based initialization.

Miscellaneous changes

maggy/core/rpc.py: Linting changes to conform with global variable name style.
maggy/util.py: Added two functions to keep hops out of maggy dependencies. Fixed a bug from #88 .
examples/notebooks: Adapted notebooks to new lagom syntax.

Known issues Ablation tests are still ongoing, so far there seems to be an issue with the heartbeat.

RiccardoGrigoletto commented 3 years ago

It looks good to me, I tried it with tensorflow in my vm and works.

amacati commented 3 years ago

Why were there still calls to hopsutils and experiment_utils in the code? I converted those calls to EnvSing, pretty sure..

RiccardoGrigoletto commented 3 years ago

It was in the first commit then you change them in the next commit. I didn't see it at the beginning but then I saw it and marked the comments as 'resolved'

amacati commented 3 years ago

Gridsearch is tested and should work.

logicalclocks / maggy

lagom / executor / driver refactor #89