Closed liuzhe-lz closed 3 years ago
Updated schema: (field names will be camelCase in YAML and snake_case in Python)
- experiment name: string | undefined
Mnemonic name of the experiment. This will be shown in web UI and nnictl.
- trial concurrency: int
Specify how many trials should be run concurrently.
The real concurrency also depends on hardware resources and may be less than this number.
- max experiment duration: string | undefined
Limit the duration of this experiment if specified.
When the time runs out, the experiment will stop creating trials but continue to serve web UI.
Format: number + "s"/"m"/"h"/"d" (stands for seconds/minuts/hours/days respectively)
Examples: "10m", "0.5h"
- max trial number: int | undefined
Limit the number of trials to create if specified.
When the budget runs out, the experiment will stop creating trials but continue to serve web UI.
- training service: string
Supported value: "local", "remote", "openpai"
- search space file: string | undefined
Path to a JSON file containing the search space.
If it is a relative path, it relates to the directory containing this experiment config.
Search space format is determined by tuner. Common format for built-in tuners is documeted here <LINK>
Mutually exclusive to `search space`.
- search space: Object | undefined
Search space object.
The format is determined by tuner. Common format for built-in tuners is documented here <LINK>
Mutually exclusive to `search space file`.
- use annotation: bool = false
Enable annotation <LINK>
When using annotation, search space should not be specified manually.
- nni manager ip: string | undefined
Used by training machines to access NNI manager. Not used in local mode.
If not specified, this will be the default IPv4 address of outgoing connection.
- trial command: string | list<string>
Command(s) to launch trial.
Bash will be used for Linux and macOS, while PowerShell will be used for Windows.
- trial code directory: string = "."
Path to the directory containing trial source files.
All files in this directory will be sent to training machine, unless there is a `.nniignore` file <LINK>
If it is a relative path, it relates to the directory containing this experiment config.
- trial gpu number: int | undefined
Number of GPUs used by each trial.
If set to zero, trials may not have access to any GPU.
If not specified, trials will be created as if they require no GPU, but they can still access all GPUs on the training machine.
- reuse mode: bool = false
Enable reuse mode <LINK>
- tuner gpu indices: list<int> | string | undefined
Limit the GPUs visible to tuner, assessor, and advisor.
This will be the `CUDA_VISIBLE_DEVICES` environment variable of tuner process.
Because tuner, assessor, and advisor run in same process, this option will affect them all.
- debug: bool = false
Enable debug mode.
Logging will be more verbose and some internal validation will be loosen.
- experiment working directory: string | undefined
(might hidden)
Specify the directory to place log, checkpoint, metadata, and other run-time stuff.
`<home>/nni-experiments` will be used by default.
NNI will create a subdirectory named by experiment ID, so it is safe to use same directory for multiple experiments.
- log level: string | undefined (not documented for normal user)
(might hidden)
Supported value: "trace", "debug", "info", "warning", "error", "fatal"
This option will affect all modules except trials, including tuner, NNI manager, training service, etc.
For Python modules, "trace" acts as `logging.DEBUG` and "fatal" acts as `logging.CRITICAL`.
- multi thread dispatcher: bool = false
(hidden)
- version check: bool | undefined
(hidden)
- log collection: str | undefined
(hidden)
* tuner / assessor / advisor: Algorithm
- name: string | undefined
Name of a built-in algorithm, or an algorithm registered with `nnictl package install` <NEED UPDATE>
Mutually exclusive to `class`.
- class: string | undefined
Qualified class name.
Mutually exclusive to `name`.
Example: `hyperopt_tuner.HyperoptTuner`
- code directory: string | undefined
Path to directory containing the `class`.
If not specified, the class will be looked up in Python module search path <https://docs.python.org/3/tutorial/modules.html#the-module-search-path>
If it is a relative path, it relates to the directory containing this experiment config.
- class arguments: Object = {}
Arguments of the algorithm.
See algorithm's document for supported value.
* local
- use active gpu: bool
Specify whether NNI should submit trials to GPUs occupied by other tasks.
If your are using desktop system with GUI, set this to `true`.
- max trial number per gpu: int = 1
Specify how many trials can share one GPU.
- gpu indices: list<int> | string | undefined
Limit the GPUs visible to trial processes.
This will be the `CUDA_VISIBLE_DEVICES` environment variable.
* remote
- machine list: list<RemoteMachine>
- host: string
IP or hostname of the remote machine.
- ssh port: int = 22
SSH service port.
- user name: string
Login user name.
- password: string | undefined
Login password.
If not specified, `ssh key file` will be used instead.
- ssh key file: string = "~/.ssh/id_rsa"
Path to ssh key file (identity file).
Only used when `password` is not specified.
If it is a relative path, it relates to the directory containing this experiment config.
- ssh passphrase: string | undefined
Passphrase of SSH identity file.
- trial prepare command: string | list<string> | undefined
Command(s) to run before launching each trial.
This is useful if preparing steps vary for different machines.
- use active gpu: bool
Specify whether NNI should submit trials to GPUs occupied by other tasks.
- max trial number per gpu: int = 1
Specify how many trials can share one GPU.
- gpu indices: list<int> | string | undefined
Limit the GPUs visible to trial processes.
This will be the `CUDA_VISIBLE_DEVICES` environment variable.
* openpai
- host: string
Hostname of OpenPAI service.
- user name: string
OpenPAI user name.
- token: string
OpenPAI user token.
This can be found in your OpenPAI user settings page.
- trial cpu number: int = 1
Number of CPUs used by each trial.
- trial memory: string
Memory used by each trial.
Examples: "1gb", "512mb"
- docker image: string = "msranni/nni"
Label for docker image.
- docker auth file: string | undefined
Path to user authentication file. Used for private docker registry.
See OpenPAI's document for auth file format. <LINK>
If it is a relative path, it relates to the directory containing this experiment config.
- shmMB: string | undefined
See OpenPAI's document of `taskRole.shmMB`. <LINK>
- portList: list<portType> | undefined
See OpenPAI's document of `taskRole.portList`. <LINK>
* [temporarily removed]
- tuner.includeIntermediateResults
@liuzhe-lz, seems this feature had been finished, so I will close this issue, feel free to reopen it.
Current version of experiment configuration is somewhat verbose. There are many fields in "light template" which most users will not care. And the documentation is soooooo tedious that obviously nobody has read it before we paste a section link in their issue.
So I am planning to update the schema in v2.0 release.
For backward compatibility, NNI will parse configuration files like this:
Planning Modification of Each Field
authorName: str
experimentName: str
description: optional str
trialConcurrency: int
maxExecDuration: optional ( number + d|h|m|s )
maxTrialNum: optional int, defaults to 99999
trainingServicePlatform: str
searchSpacePath: optional path
multiPhase: optional str
multiThread: optional str
nniManagerIp: optional str
logDir: optional path
debug: optional bool
versionCheck: optional bool
logLevel: optional trace | debug | info | warning | error | fatal
logCollection: optional http | none
useAnnotation: bool
tuner: ...
advisor: ...
assessor: ...
trial: ...
machineList:
Other Plans
useActiveGpu
should be put into template of local and remote mode.