Planning of updating experiment configuration format

Current version of experiment configuration is somewhat verbose. There are many fields in "light template" which most users will not care. And the documentation is soooooo tedious that obviously nobody has read it before we paste a section link in their issue.

So I am planning to update the schema in v2.0 release.

For backward compatibility, NNI will parse configuration files like this:

if new_schema.validate(config):
    return config
if old_schema.validate(config):
    print_warning()
    return convert_to_new_schema(config)
raise error_message_in_new_schema

Planning Modification of Each Field

authorName: str
- remove
- backward compatibility: append it to experimentName, e.g. "MNIST by Alice"
experimentName: str
- make it optional
description: optional str
- remove
- backward compatibility: append it to experimentName, e.g. "Name (Long detailed description)"
trialConcurrency: int
maxExecDuration: optional ( number + d|h|m|s )
maxTrialNum: optional int, defaults to 99999
trainingServicePlatform: str
searchSpacePath: optional path
- add YAML support
- maybe allow it to be embedded in config file?
multiPhase: optional str
- remove (deprecated feature)
multiThread: optional str
- remove (can't find a use case)
- I believe this should be controlled in tuner implementation, not end user
nniManagerIp: optional str
- move it to training service?
logDir: optional path
- remove (not compatible with experiment management refactor)
- If users don't like ~/nni-experiment , we should add a global configuration
debug: optional bool
versionCheck: optional bool
- remove (user should not care about this)
- In the rare case, use "debug: true"
logLevel: optional trace | debug | info | warning | error | fatal
- remove ("debug: true" should be enough)
logCollection: optional http | none
- remove, or at least remove from doc
- reason 1: If this feature is important, we should improve the performance and set it to always-on
- reason 2: This is specific to certain training services
useAnnotation: bool
- make it optional (default to false)
tuner: ...
advisor: ...
assessor: ...
- TODO (relevant to package management)
trial: ...
- TODO
machineList:
- move to remoteConfig

Other Plans

useActiveGpu should be put into template of local and remote mode.
Comment out optional fields in template.

Updated schema: (field names will be camelCase in YAML and snake_case in Python)

- experiment name: string | undefined
    Mnemonic name of the experiment. This will be shown in web UI and nnictl.

- trial concurrency: int
    Specify how many trials should be run concurrently.
    The real concurrency also depends on hardware resources and may be less than this number.

- max experiment duration: string | undefined
    Limit the duration of this experiment if specified.
    When the time runs out, the experiment will stop creating trials but continue to serve web UI.
    Format: number + "s"/"m"/"h"/"d" (stands for seconds/minuts/hours/days respectively)
    Examples: "10m", "0.5h"

- max trial number: int | undefined
    Limit the number of trials to create if specified.
    When the budget runs out, the experiment will stop creating trials but continue to serve web UI.

- training service: string
    Supported value: "local", "remote", "openpai"

- search space file: string | undefined
    Path to a JSON file containing the search space.
    If it is a relative path, it relates to the directory containing this experiment config.
    Search space format is determined by tuner. Common format for built-in tuners is documeted here <LINK>
    Mutually exclusive to `search space`.

- search space: Object | undefined
    Search space object.
    The format is determined by tuner. Common format for built-in tuners is documented here <LINK>
    Mutually exclusive to `search space file`.

- use annotation: bool = false
    Enable annotation <LINK>
    When using annotation, search space should not be specified manually.

- nni manager ip: string | undefined
    Used by training machines to access NNI manager. Not used in local mode.
    If not specified, this will be the default IPv4 address of outgoing connection.

- trial command: string | list<string>
    Command(s) to launch trial.
    Bash will be used for Linux and macOS, while PowerShell will be used for Windows.

- trial code directory: string = "."
    Path to the directory containing trial source files.
    All files in this directory will be sent to training machine, unless there is a `.nniignore` file <LINK>
    If it is a relative path, it relates to the directory containing this experiment config.

- trial gpu number: int | undefined
    Number of GPUs used by each trial.
    If set to zero, trials may not have access to any GPU.
    If not specified, trials will be created as if they require no GPU, but they can still access all GPUs on the training machine.

- reuse mode: bool = false
    Enable reuse mode <LINK>

- tuner gpu indices: list<int> | string | undefined
    Limit the GPUs visible to tuner, assessor, and advisor.
    This will be the `CUDA_VISIBLE_DEVICES` environment variable of tuner process.
    Because tuner, assessor, and advisor run in same process, this option will affect them all.

- debug: bool = false
    Enable debug mode.
    Logging will be more verbose and some internal validation will be loosen.

- experiment working directory: string | undefined
    (might hidden)
    Specify the directory to place log, checkpoint, metadata, and other run-time stuff.
    `<home>/nni-experiments` will be used by default.
    NNI will create a subdirectory named by experiment ID, so it is safe to use same directory for multiple experiments.

- log level: string | undefined  (not documented for normal user)
    (might hidden)
    Supported value: "trace", "debug", "info", "warning", "error", "fatal"
    This option will affect all modules except trials, including tuner, NNI manager, training service, etc.
    For Python modules, "trace" acts as `logging.DEBUG` and "fatal" acts as `logging.CRITICAL`.

- multi thread dispatcher: bool = false
    (hidden)

- version check: bool | undefined
    (hidden)

- log collection: str | undefined
    (hidden)

* tuner / assessor / advisor: Algorithm

  - name: string | undefined
      Name of a built-in algorithm, or an algorithm registered with `nnictl package install` <NEED UPDATE>
      Mutually exclusive to `class`.

  - class: string | undefined
      Qualified class name.
      Mutually exclusive to `name`.
      Example: `hyperopt_tuner.HyperoptTuner`

  - code directory: string | undefined
      Path to directory containing the `class`.
      If not specified, the class will be looked up in Python module search path <https://docs.python.org/3/tutorial/modules.html#the-module-search-path>
      If it is a relative path, it relates to the directory containing this experiment config.

  - class arguments: Object = {}
      Arguments of the algorithm.
      See algorithm's document for supported value.

* local

  - use active gpu: bool
      Specify whether NNI should submit trials to GPUs occupied by other tasks.
      If your are using desktop system with GUI, set this to `true`.

  - max trial number per gpu: int = 1
      Specify how many trials can share one GPU.

  - gpu indices: list<int> | string | undefined
      Limit the GPUs visible to trial processes.
      This will be the `CUDA_VISIBLE_DEVICES` environment variable.

* remote

  - machine list: list<RemoteMachine>

    - host: string
        IP or hostname of the remote machine.

    - ssh port: int = 22
        SSH service port.

    - user name: string
        Login user name.

    - password: string | undefined
        Login password.
        If not specified, `ssh key file` will be used instead.

    - ssh key file: string = "~/.ssh/id_rsa"
        Path to ssh key file (identity file).
        Only used when `password` is not specified.
        If it is a relative path, it relates to the directory containing this experiment config.

    - ssh passphrase: string | undefined
        Passphrase of SSH identity file.

    - trial prepare command: string | list<string> | undefined
        Command(s) to run before launching each trial.
        This is useful if preparing steps vary for different machines.

    - use active gpu: bool
        Specify whether NNI should submit trials to GPUs occupied by other tasks.

    - max trial number per gpu: int = 1
        Specify how many trials can share one GPU.

    - gpu indices: list<int> | string | undefined
        Limit the GPUs visible to trial processes.
        This will be the `CUDA_VISIBLE_DEVICES` environment variable.

* openpai

  - host: string
      Hostname of OpenPAI service.

  - user name: string
      OpenPAI user name.

  - token: string
      OpenPAI user token.
      This can be found in your OpenPAI user settings page.

  - trial cpu number: int = 1
      Number of CPUs used by each trial.

  - trial memory: string
      Memory used by each trial.
      Examples: "1gb", "512mb"

  - docker image: string = "msranni/nni"
      Label for docker image.

  - docker auth file: string | undefined
      Path to user authentication file. Used for private docker registry.
      See OpenPAI's document for auth file format. <LINK>
      If it is a relative path, it relates to the directory containing this experiment config.

  - shmMB: string | undefined
      See OpenPAI's document of `taskRole.shmMB`. <LINK>

  - portList: list<portType> | undefined
      See OpenPAI's document of `taskRole.portList`. <LINK>

* [temporarily removed]

  - tuner.includeIntermediateResults

microsoft / nni

Planning of updating experiment configuration format #3087

Planning Modification of Each Field

Other Plans