microsoft / nni

An open source AutoML toolkit for automate machine learning lifecycle, including feature engineering, neural architecture search, model compression and hyper-parameter tuning.
https://nni.readthedocs.io
MIT License
14.05k stars 1.81k forks source link

Log mechanism discussion #4499

Open cruiseliu opened 2 years ago

cruiseliu commented 2 years ago

Current Log Behavior

Due to a bug, Experiment log is written to dispatcher.log.

Component Log Behavior Log Level Detected by Note
Experiment management API Log → stdout & dispatcher.log Aware debug Explicit setup nni.* are colorful
nnictl (create, resume, view) Log → stdout & dispatcher.log Aware debug Explicit setup Same as Experiment
nnictl (Other commands) Log → stdout INFO (default) Never used
Trials (local) Log & stdout → trial.log INFO NNI_PLATFORM
Trials (reuse) Log → stdout INFO REUSE_MODE Same as default
Standalone (debug trial) Log → stdout INFO (default)
Tuner Log → dispatcher.log Aware logLevel SDK_PROCESS
NAS main process Log → stdout INFO (default)
NAS multi-trial trials Same as HPO trials INFO NNI_PLATFORM
Compression Log → stdout INFO (default)

Expected Log Behavior

Component Log Behavior Log Level Detected by
Experiment management API NNI log → stdout & experiment.log Aware logLevel Explicit setup
NAS main process Same as experiment - -
nnictl (create, resume, view) Same as experiment - -
nnictl (Other commands) Does not use logging; don't touch - -
Trials (local) All log → trial.log & stdout → trial.stdout Aware logLevel? NNI_PLATFORM
NAS multi-trial trials Same as HPO trials - -
Standalone (debug trial) NNI log → stdout INFO Trial APIs
Tuner All log → dispatcher.log Aware logLevel SDK_PROCESS
Compression ? ? ?

NNI manager log: ?

QuanluZhang commented 2 years ago
liuzhe-lz commented 2 years ago

Discussion iteration 1 conclusions:

Log files

Each HPO experiment writes 3 files:

Each NAS multi-trial experiment writes 2 files:

Each NAS oneshot experiment writes 1 file:

Auto-compress works as HPO; other compression experiments do not write log files.

Log content

A log message should be written to stdout, if and only if:

A log message should be written to "experiment.log", if and only if:

A log message should be written to "dispatcher.log", if and only if:

Above rules apply to all Python modules.

How to access experiment ID

Ideally a module "inside" experiment (like NAS strategy) should access experiment ID via experiment.logger, but if it does not have a reference to the experiment, there will be a way to inference automatically.

We assume each thread only runs one experiment at a time. There will be a stack or priority queue for each thread, maintaining most recently activated experiment in current thread. The stack top is considered "current experiment".

If a strategy (or something similar) is guaranteed to run in a separate thread, the inference should be reliable; otherwise it should be considered last resort. In short, it depends on RetiariiExperiment implementation.

Trial log

Each trial have 3 output files:

Log file should only contain logger name starts with nni.. The log level is not yet discussed.