PPO Tuner Unavailable for Classical NAS

Garen-Wang commented 3 years ago

Environment:

NNI version: 2.0
NNI mode (local|remote|pai): local
Client OS: Arch Linux
Python version: 3.8.5
PyTorch/TensorFlow version: 1.7.1
Is conda/virtualenv/venv used?: Conda is used
Is running in Docker?: No

Log message:

nnimanager.log: None
dispatcher.log: None

nnictl stdout and stderr:

INFO:  expand searchSpacePath: nni_auto_gen_search_space.json to /home/garen/NNI-Student-Program-2020/Task22/classical/nni_auto_gen_search_space.json 
INFO:  expand codeDir: . to /home/garen/NNI-Student-Program-2020/Task22/classical/. 
WARNING: Validation with V1 schema failed. Trying to convert from V2 format...
ERROR: Conversion from v2 format failed: RuntimeError('Unsupported Training service configuration!')
ERROR: Config in v1 format validation failed. SchemaError('Key \'tuner\' error:\n<nni.tools.nnictl.config_schema.AlgoSchema object at 0x7fd60e50aa90>.validate({\'builtinTunerName\': \'PPOTuner\', \'classArgs\': {\'optimize_mode\': \'maximize\'}}) raised ModuleNotFoundError("No module named \'gym\'")')

What issue meet, what's expected?:

PPO Tuner failed to load when creating new classcial NAS experiments, which also occurs on version 1.9.

How to reproduce it?:

Manually prepare a CNN model and code for auto-generation, run nni ss_gen to auto-generate the search space file, and configure config.yml according to the example. When running nni create command, error seems to occur.

Additional information:

my config.yml:

authorName: default
experimentName: example_with_cifar_classical_NAS
trialConcurrency: 1
maxExecDuration: 100h
maxTrialNum: 10
#choice: local, remote, pai
trainingServicePlatform: local
#please use `nnictl ss_gen` to generate search space file first
# searchSpacePath: <the_generated_search_space_path>
searchSpacePath: nni_auto_gen_search_space.json
useAnnotation: false
tuner:
  builtinTunerName: PPOTuner
  classArgs:
    optimize_mode: maximize
trial:
  command: python classical.py
  codeDir: .
  gpuNum: 0

my nni_auto_gen_search_space.json:

{
  "conv1": {
    "_type": "layer_choice",
    "_value": [
      "conv3*3",
      "conv5*5"
    ]
  },
  "mid_conv": {
    "_type": "layer_choice",
    "_value": [
      "0",
      "1"
    ]
  },
  "skip_conv": {
    "_type": "input_choice",
    "_value": {
      "candidates": [
        "",
        ""
      ],
      "n_chosen": 1
    }
  }
}

my model for NAS:

class NeuralNet(nn.Module):
    def __init__(self):
        super(NeuralNet, self).__init__()
        self.conv1 = mutables.LayerChoice(OrderedDict([
            ("conv3*3", nn.Conv2d(3, 8, 3, 1)),
            ("conv5*5", nn.Conv2d(3, 8, 5, 1))
        ]), key='conv1')
        self.mid_conv = mutables.LayerChoice([
            nn.Conv2d(8, 8, 3, 1, padding=1),
            nn.Conv2d(8, 8, 5, 1, padding=2)
        ], key='mid_conv')
        self.conv2 = nn.Conv2d(8, 16, 5, 1)
        self.pool = nn.MaxPool2d(2, 2)
        self.func1 = nn.Linear(16 * 5 * 5, 120)
        self.func2 = nn.Linear(120, 84)
        self.func3 = nn.Linear(84, 10)
        self.input_switch = mutables.InputChoice(n_candidates=2, n_chosen=1, key="skip_conv")

    def forward(self, x):
        x = self.pool(F.relu(self.conv1(x)))
        old_x = x
        zero_x = torch.zeros_like(old_x)
        skip_x = self.input_switch([zero_x, old_x])
        x = F.relu(self.mid_conv(x))
        x += skip_x
        x = self.pool(F.relu(self.conv2(x)))
        x = x.view(-1, 16 * 5 * 5)
        x = F.relu(self.func1(x))
        x = F.relu(self.func2(x))
        x = self.func3(x)
        return x

QuanluZhang commented 3 years ago

@Garen-Wang thanks for reporting this issue. PPOTuner requires additional packages, so need to install with python3 -m pip install nni[PPOTuner]. We will update doc accordingly.

Garen-Wang commented 3 years ago

@QuanluZhang Thanks a lot for your help. Having installed nni[PPOTuner] plus tensorflow, now it works properly.

kvartet commented 3 years ago

@QuanluZhang, it seems that classic NAS has been refactored in Retiarii, so should we update the doc of tuners used by NAS in the HPO part?

QuanluZhang commented 3 years ago

@kvartet we can remove PPOTuner from HPO tuners, as PPO has been supported in Retiarii framework (i.e., nni.retiarii.strategy.PolicyBasedRL)

microsoft / nni

PPO Tuner Unavailable for Classical NAS #3328