nginyc / rafiki

Rafiki is a distributed system that supports training and deployment of machine learning models using AutoML, built with ease-of-use in mind.
Apache License 2.0
36 stars 23 forks source link

Learnings from NNI #126

Open nginyc opened 5 years ago

nginyc commented 5 years ago

As our project is similar to NNI by Microsoft, I thought it might be good to study how they're doing things, compare it how we're doing things, and derive some learnings.

How model's hyperparameter search space is defined

How model developers configure the AutoML algorithm

How the model interfaces with the AutoML system

How the AutoML system configures the model's training behaviour

How the AutoML system supports architecture tuning

nudles commented 5 years ago

Thanks for the comparison. I list some comments (not in order)

  1. search space. NNI has another way of defining the search space, which uses annotation. It moves the hyper-parameter definition closer to the use place. The python code can run with and without hyper-parameter tuning.
  2. by making NNI a library, it would be easier for local development and debugging. The running flow is controlled by model developers. Rafiki provides a platform for hyper-parameter search. Rafiki controls the flow. Like map-reduce, the system controls the flow and the developers fill in the code of map and reduce.
  3. it would be good to decouple a system into modular components. We will have resource management, filesystem or datastore, hyper-parameter tuning, inference queueing, etc.
  4. we may not be able to unify the architecture tuning algorithms and hyper-parameter tuning algorithms. E.g., it is difficult to even unify ENAS and DART algorithms..
  5. mlflow and kubeflow are two other projects with the hyper-parameter tuning feature.