Add architecture tuning with ENAS & hyperparameter tuning with parameter sharing

nginyc commented 5 years ago

Fixes #101 Fixes #97 Fixes #98 Fixes #50 Fixes #99

API Changes

For model developers:

Parameters returned by BaseModel.dump_parameters() must now strictly conform to type Dict[str, Union[str, int, float, np.ndarray]]**
Dataaset utils object rafiki.model.dataset_utils has been relocated into rafiki.model.utils as utils.dataset**
Logger utils object rafiki.model.logger has been relocated into rafiki.model.utils as utils.logger**
test_model_class has been relocated from rafiki.model into rafiki.model.dev**
rafiki.constants.TaskType has been removed in favour of simply using strings (to not hard-code task names in core code)**
New optional argument shared_params for BaseModel.train()**
New optional static method BaseModel.teardown() to do any class-wide teardown logic across trials
New PolicyKnob & ArchKnob knob types (to leverage on new cross-trial parameter sharing & architecture tuning features)

For application developers:

New budget option TIME_HOURS for creating train jobs
rafiki.constants.BudgetType has been renamed to rafiki.constants.BudgetOption**

For application users:

Endpoint /predict for inference jobs now support batch predictions (docs)

** specifies that change is NOT backward compatible

Full Changelist

Major changes:

Add budget option of TIME_HOURS (docs)
Rework interface between advisor & train worker with the concept of "Proposals" for more flexible tuning schemes (code)
Setup & document decision framework advisor to decide on which tuning scheme to use for each train job & model (docs)
Allow cross-trial parameter sharing with changes to the model API and leveraging on Redis for cross-worker sharing of parameters (docs]
Add hyperparameter tuning with epsilon greedy parameter sharing as a tuning scheme, with a sample usage script train_densenet.py (code)
Add ENAS as a tuning scheme, introducing PolicyKnob and ArchKnob, with a sample usage script run_enas.py (docs, code)
Add making of batch predictions on inference jobs (docs)
Dynamically deploy a unique advisor for each model during training, instead of having a single static advisor across train jobs

Minor changes:

Add example CIFAR-10 dataset loading script (code)
Add sample script for converting directories of images to Rafiki's standard image classification format (code)
Add typing and clean up details in documentation of model interface & Rafiki Client
Add & refactor abstract access layers for Redis & file system (to store datasets & model parameters)
Move more responsibility of reading from the database from train worker to Advisor (prefer "smart" master, "dumb" worker)
Rework service deployment to more accurately reflect statuses of train & inference jobs
Add tests for inference job deployment

This branch's changes are largely reflected in as updated documentation deployed at https://nginyc.github.io/rafiki/docs/0.2.0. In particular, there is a new "How Model Tuning Works" page, and we have 2 new model training scripts run_enas.py and train_densenet.py.

Sorry for the long commit history - I was experimenting with ENAS for a while and there was a huge departure in code.

nudles commented 5 years ago

This is a big PR. Could you give a list of API changes for existing users? Would there be any issues if they want to run their existing examples after this PR is merged?

nudles commented 5 years ago

the documentation on hyper-parameter tuning is replaced by model tuning? Model tuning doc is a bit complex. It would be better to start with simple schemes, e.g., just for hyper-parameter tuning. And then add advanced stuff, e.g., tuning policies like early stopping, and architecture tuning

nginyc commented 5 years ago

Hi @nudles, I have updated the details of the PR with API changes, some of which are not backward compatible. I will be creating a full client API changelist when dev is to be merged to master, and it will be in the release notes. Let me know if there's more.

With regards to the new model tuning docs, there were no existing documentation on how model tuning is implemented on Rafiki. Following your feedback, I will be simplifying the new model tuning docs

nginyc commented 5 years ago

Have resolved the conflicts, improved the model tuning docs slightly, and did some testing. Ready to be merged

nginyc / rafiki

Add architecture tuning with ENAS & hyperparameter tuning with parameter sharing #128

API Changes

Full Changelist