microsoft / nni

An open source AutoML toolkit for automate machine learning lifecycle, including feature engineering, neural architecture search, model compression and hyper-parameter tuning.
https://nni.readthedocs.io
MIT License
14.04k stars 1.81k forks source link

Alternative for using ValueChoice as a boolean #4819

Open SimonHuesgen opened 2 years ago

SimonHuesgen commented 2 years ago

Describe the issue: Hello there,

Here I am trying to define a search space using nni v2.7 and I came across a concern. It happened multiple times that I wanted to define mutations using a ValueChoice object and followingly using it in an if-else-clause. This throws an error every time, saying this is not the intended use of ValueChoice. One simplified example would be the choice of using BatchNormalization; I attempted: ''' bn_choice = nn.ValueChoice([0,1], label="bn_choice") if bn_choice==1: self.bn = nn.BatchNorm2d(128) ''' in init and ''' if self.bn: x = self.bn(x) ''' in forward

Is there an alternative way to define this?

Thanks in advance!

Environment:

Configuration:

Log message:

How to reproduce it?:

ultmaster commented 2 years ago

Does this work?

https://nni.readthedocs.io/en/latest/reference/nas/search_space.html#nni.retiarii.nn.pytorch.ValueChoice.condition

Another solution is that, you should probably use LayerChoice instead, to choose from BatchNorm and Identity.

SimonHuesgen commented 2 years ago

Thanks for the quick reply! Using the condition to turn a ValueChoice Object to a bool object throws an error: keyword can't be an expression.

Using LayerChoice works for one layer, but it does not seem to work as a decision for multiple layers. What I mean is, that I want a binary choice which decides for multiple layers in the network if it will use BatchNorm or Identity. So I thought to try using the same label: self.layer3 = nn.LayerChoice([nn.BatchNorm2d(64),nn.Identity()], label="bn_choice") self.layer6 = nn.LayerChoice([nn.BatchNorm2d(128),nn.Identity()], label="bn_choice") but when test running experiments the trials fail immediately...

Any other ideas?

matluster commented 2 years ago

keyword can't be an expression.

I need more traceback on this error.

but when test running experiments the trials fail immediately...

I need stderr and trial's log for the failed trials.

SimonHuesgen commented 2 years ago

somehow I am not able to recreate the 'keyword can't be an expression' error right now...

/Users/simonhuesgen/opt/anaconda3/lib/python3.7/site-packages/pytorch_lightning/trainer/data_loading.py:133: UserWarning: The dataloader, val_dataloader 0, does not have many workers which may be a bottleneck. Consider increasing the value of the num_workers argument(try 8 which is the number of cpus on this machine) in theDataLoader` init to improve performance. f"The dataloader, {name}, does not have many workers which may be a bottleneck." Traceback (most recent call last): File "/Users/simonhuesgen/opt/anaconda3/lib/python3.7/runpy.py", line 193, in _run_module_as_main "main", mod_spec) File "/Users/simonhuesgen/opt/anaconda3/lib/python3.7/runpy.py", line 85, in _run_code exec(code, run_globals) File "/Users/simonhuesgen/opt/anaconda3/lib/python3.7/site-packages/nni/retiarii/trial_entry.py", line 28, in engine.trial_execute_graph() File "/Users/simonhuesgen/opt/anaconda3/lib/python3.7/site-packages/nni/retiarii/execution/base.py", line 146, in trial_execute_graph graph_data.evaluator._execute(model_cls) File "/Users/simonhuesgen/opt/anaconda3/lib/python3.7/site-packages/nni/retiarii/evaluator/pytorch/lightning.py", line 119, in _execute return self.fit(model_cls) File "/Users/simonhuesgen/opt/anaconda3/lib/python3.7/site-packages/nni/retiarii/evaluator/pytorch/lightning.py", line 148, in fit return self.trainer.fit(self.module, self.train_dataloader, self.val_dataloaders) File "/Users/simonhuesgen/opt/anaconda3/lib/python3.7/site-packages/pytorch_lightning/trainer/trainer.py", line 741, in fit self._fit_impl, model, train_dataloaders, val_dataloaders, datamodule, ckpt_path File "/Users/simonhuesgen/opt/anaconda3/lib/python3.7/site-packages/pytorch_lightning/trainer/trainer.py", line 685, in _call_and_handle_interrupt return trainer_fn(*args, kwargs) File "/Users/simonhuesgen/opt/anaconda3/lib/python3.7/site-packages/pytorch_lightning/trainer/trainer.py", line 777, in _fit_impl self._run(model, ckpt_path=ckpt_path) File "/Users/simonhuesgen/opt/anaconda3/lib/python3.7/site-packages/pytorch_lightning/trainer/trainer.py", line 1199, in _run self._dispatch() File "/Users/simonhuesgen/opt/anaconda3/lib/python3.7/site-packages/pytorch_lightning/trainer/trainer.py", line 1279, in _dispatch self.training_type_plugin.start_training(self) File "/Users/simonhuesgen/opt/anaconda3/lib/python3.7/site-packages/pytorch_lightning/plugins/training_type/training_type_plugin.py", line 202, in start_training self._results = trainer.run_stage() File "/Users/simonhuesgen/opt/anaconda3/lib/python3.7/site-packages/pytorch_lightning/trainer/trainer.py", line 1289, in run_stage return self._run_train() File "/Users/simonhuesgen/opt/anaconda3/lib/python3.7/site-packages/pytorch_lightning/trainer/trainer.py", line 1311, in _run_train self._run_sanity_check(self.lightning_module) File "/Users/simonhuesgen/opt/anaconda3/lib/python3.7/site-packages/pytorch_lightning/trainer/trainer.py", line 1375, in _run_sanity_check self._evaluation_loop.run() File "/Users/simonhuesgen/opt/anaconda3/lib/python3.7/site-packages/pytorch_lightning/loops/base.py", line 145, in run self.advance(*args, *kwargs) File "/Users/simonhuesgen/opt/anaconda3/lib/python3.7/site-packages/pytorch_lightning/loops/dataloader/evaluation_loop.py", line 110, in advance dl_outputs = self.epoch_loop.run(dataloader, dataloader_idx, dl_max_batches, self.num_dataloaders) File "/Users/simonhuesgen/opt/anaconda3/lib/python3.7/site-packages/pytorch_lightning/loops/base.py", line 145, in run self.advance(args, kwargs) File "/Users/simonhuesgen/opt/anaconda3/lib/python3.7/site-packages/pytorch_lightning/loops/epoch/evaluation_epoch_loop.py", line 122, in advance output = self._evaluation_step(batch, batch_idx, dataloader_idx) File "/Users/simonhuesgen/opt/anaconda3/lib/python3.7/site-packages/pytorch_lightning/loops/epoch/evaluation_epoch_loop.py", line 217, in _evaluation_step output = self.trainer.accelerator.validation_step(step_kwargs) File "/Users/simonhuesgen/opt/anaconda3/lib/python3.7/site-packages/pytorch_lightning/accelerators/accelerator.py", line 239, in validation_step return self.training_type_plugin.validation_step(step_kwargs.values()) File "/Users/simonhuesgen/opt/anaconda3/lib/python3.7/site-packages/pytorch_lightning/plugins/training_type/training_type_plugin.py", line 219, in validation_step return self.model.validation_step(args, *kwargs) File "/Users/simonhuesgen/opt/anaconda3/lib/python3.7/site-packages/nni/retiarii/evaluator/pytorch/lightning.py", line 205, in validation_step self.log('val_loss', self.criterion(y_hat, y), prog_bar=True) File "/Users/simonhuesgen/opt/anaconda3/lib/python3.7/site-packages/torch/nn/modules/module.py", line 1110, in _call_impl return forward_call(input, **kwargs) File "/Users/simonhuesgen/opt/anaconda3/lib/python3.7/site-packages/torch/nn/modules/loss.py", line 1165, in forward label_smoothing=self.label_smoothing) File "/Users/simonhuesgen/opt/anaconda3/lib/python3.7/site-packages/torch/nn/functional.py", line 2996, in cross_entropy return torch._C._nn.cross_entropy_loss(input, target, weight, _Reduction.get_enum(reduction), ignore_index, label_smoothing) RuntimeError: only batches of spatial targets supported (3D tensors) but got targets of dimension: 1


and trial.log look like this:

[2022-04-28 19:03:48] INFO (torch.distributed.nn.jit.instantiator/MainThread) Created a temporary directory at /var/folders/89/587vv7m93tq_8_d3094mgwnc0000gn/T/tmpqh_utbol [2022-04-28 19:03:48] INFO (torch.distributed.nn.jit.instantiator/MainThread) Writing /var/folders/89/587vv7m93tq_8_d3094mgwnc0000gn/T/tmpqh_utbol/_remote_module_non_sriptable.py [2022-04-28 19:03:48] INFO (pytorch_lightning.utilities.distributed/MainThread) GPU available: False, used: False [2022-04-28 19:03:48] INFO (pytorch_lightning.utilities.distributed/MainThread) TPU available: False, using: 0 TPU cores [2022-04-28 19:03:48] INFO (pytorch_lightning.utilities.distributed/MainThread) IPU available: False, using: 0 IPUs [2022-04-28 19:03:49] INFO (pytorch_lightning.callbacks.model_summary/MainThread) | Name | Type | Params

0 | criterion | CrossEntropyLoss | 0
1 | metrics | ModuleDict | 0
2 | model | _model | 76.0 K

76.0 K Trainable params 0 Non-trainable params 76.0 K Total params 0.304 Total estimated model params size (MB) [2022-04-28 19:03:49] PRINT Validation sanity check: 0it [00:00, ?it/s] [2022-04-28 19:03:49] PRINT Validation sanity check: 0%| | 0/2 [00:00<?, ?it/s]



PS: Are there any larger examples (with more variety in possibilities of how to use different mutator functions) for search spaces than the example notebooks?
matluster commented 2 years ago

and later using bool_bn_choice in if-else still throws:

You should never use if-else branch. ValueChoice.condition works like tf.cond. You need to write something like:

branch_a = nn.Sequential(..., ...)
branch_b = nn.Sequential(..., ..., BN)
self.block = nn.ValueChoice.condition(nn.ValueChoice([False, True], branch_a, branch_b)

I don't know whether putting modules into nn.ValueChoice.condition works, but you can try.

RuntimeError: only batches of spatial targets supported (3D tensors) but got targets of dimension: 1

Your output is a one-dimensional tensor, but you are trying to use pl.Classification, if my guess is correct. Classification doesn't work like this.

Are there any larger examples (with more variety in possibilities of how to use different mutator functions) for search spaces than the example notebooks?

There is a space hub here. But it's in a preview state.

SimonHuesgen commented 2 years ago

Thanks for all the help.

Trying to use it like this: branch_a = nn.Sequential(nn.Conv2d(64, 128, 3, stride=1,padding=1),nn.ReLU(),nn.MaxPool2d(2, 2)) branch_b = nn.Sequential(nn.Conv2d(64, 128, 3, stride=1,padding=1),nn.ReLU(),nn.BatchNorm2d(128),nn.MaxPool2d(2, 2)) self.block = nn.ValueChoice.condition(nn.ValueChoice([False, True], branch_a, branch_b))

throws:


---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
<ipython-input-6-8cb293112ce7> in <module>
    119 
    120 
--> 121 model = Net()

~/opt/anaconda3/lib/python3.7/site-packages/nni/retiarii/serializer.py in __init__(self, *args, **kwargs)
    123             self._model_namespace = ModelNamespace()
    124             with self._model_namespace:
--> 125                 super().__init__(*args, **kwargs)
    126 
    127     _copy_class_wrapper_attributes(wrapper, reset_wrapper)

~/opt/anaconda3/lib/python3.7/site-packages/nni/common/serializer.py in new_init(self, *args, **kwargs)
    429                     self,
    430                     *[_argument_processor(arg) for arg in args],
--> 431                     **{kw: _argument_processor(arg) for kw, arg in kwargs.items()}
    432                 )
    433                 inject_trace_info(self, base, args, kwargs)

<ipython-input-6-8cb293112ce7> in __init__(self)
    104         branch_a = nn.Sequential(nn.Conv2d(64, 128, 3, stride=1,padding=1),nn.ReLU(),nn.MaxPool2d(2, 2))
    105         branch_b = nn.Sequential(nn.Conv2d(64, 128, 3, stride=1,padding=1),nn.ReLU(),nn.BatchNorm2d(128),nn.MaxPool2d(2, 2))
--> 106         self.block = nn.ValueChoice.condition(nn.ValueChoice([False, True], branch_a, branch_b))
    107         self.conv1 = nn.Conv2d(3, 64, 3, stride=1,padding=1)
    108         self.conv2 = nn.Conv2d(128, 256, 3, stride=1,padding=1)

~/opt/anaconda3/lib/python3.7/site-packages/nni/retiarii/nn/pytorch/mutation_utils.py in __new__(cls, *args, **kwargs)
     22 
     23         try:
---> 24             return cls.create_fixed_module(*args, **kwargs)
     25         except NoContextError:
     26             return super().__new__(cls)

TypeError: create_fixed_module() takes 2 positional arguments but 4 were given

Another related question: I am aware that in the documentation for ValueChoice it is specified that you can not use it in a for i in range(ValueChoice) way because it is a 'syntax sugar'. Is there an alternative to this? Specifically I want the search space to have a number of blocks to choose from (like in ResNets or NasNet) but use different number and sequences of them. A way I thought I could make that happen is by using for i in range(ValueChoice[1,2,3,4]) and then add blocks accordingly... Is there a way to achieve what I just described?

PS: Are there any large search spaces already defined for CIFAR-10 I could use (with the TPE strategy)?

SimonHuesgen commented 2 years ago

Also, do you have any idea why this model:


class ReLUConvBN(nn.Sequential):
    def __init__(self, in_size, out_size, kernel_size, stride, padding):
        super().__init__(
            nn.ReLU(),
            nn.Conv2d(in_size, out_size, kernel_size, stride=stride,padding=padding),
            nn.BatchNorm2d(out_size))
class ReLUConv(nn.Sequential):
    def __init__(self, in_size, out_size, kernel_size, stride, padding):
        super().__init__(
            nn.ReLU(),
            nn.Conv2d(in_size, out_size, kernel_size, stride=stride,padding=padding))

@model_wrapper
class Net(nn.Module):
    def __init__(self):
        super().__init__()

        self.block = nn.LayerChoice([ReLUConvBN(64,128,3,1,1),ReLUConv(64,128,3,1,1)])
        self.conv1 = nn.Conv2d(3, 64, 3, stride=1,padding=1)
        self.conv2 = nn.Conv2d(128, 256, 3, stride=1,padding=1)
        self.pool = nn.MaxPool2d(2, 2)
        self.fc1 = nn.Linear(256*8*8,120)
        self.fc2 = nn.Linear(120,10)

    def forward(self, x):
        x = F.relu(self.conv1(x))
        x = self.pool(x)
        x = self.block(x)
        x = self.pool(x)
        x = F.relu(self.conv2(x))
        x = self.fc1(x)
        x = self.fc2(x)
        return x

model = Net()

throws this error in stderr:


/Users/simonhuesgen/opt/anaconda3/lib/python3.7/site-packages/pytorch_lightning/trainer/data_loading.py:133: UserWarning: The dataloader, val_dataloader 0, does not have many workers which may be a bottleneck. Consider increasing the value of the `num_workers` argument` (try 8 which is the number of cpus on this machine) in the `DataLoader` init to improve performance.
  f"The dataloader, {name}, does not have many workers which may be a bottleneck."
Traceback (most recent call last):
  File "/Users/simonhuesgen/opt/anaconda3/lib/python3.7/runpy.py", line 193, in _run_module_as_main
    "__main__", mod_spec)
  File "/Users/simonhuesgen/opt/anaconda3/lib/python3.7/runpy.py", line 85, in _run_code
    exec(code, run_globals)
  File "/Users/simonhuesgen/opt/anaconda3/lib/python3.7/site-packages/nni/retiarii/trial_entry.py", line 28, in <module>
    engine.trial_execute_graph()
  File "/Users/simonhuesgen/opt/anaconda3/lib/python3.7/site-packages/nni/retiarii/execution/base.py", line 146, in trial_execute_graph
    graph_data.evaluator._execute(model_cls)
  File "/Users/simonhuesgen/opt/anaconda3/lib/python3.7/site-packages/nni/retiarii/evaluator/pytorch/lightning.py", line 119, in _execute
    return self.fit(model_cls)
  File "/Users/simonhuesgen/opt/anaconda3/lib/python3.7/site-packages/nni/retiarii/evaluator/pytorch/lightning.py", line 148, in fit
    return self.trainer.fit(self.module, self.train_dataloader, self.val_dataloaders)
  File "/Users/simonhuesgen/opt/anaconda3/lib/python3.7/site-packages/pytorch_lightning/trainer/trainer.py", line 741, in fit
    self._fit_impl, model, train_dataloaders, val_dataloaders, datamodule, ckpt_path
  File "/Users/simonhuesgen/opt/anaconda3/lib/python3.7/site-packages/pytorch_lightning/trainer/trainer.py", line 685, in _call_and_handle_interrupt
    return trainer_fn(*args, **kwargs)
  File "/Users/simonhuesgen/opt/anaconda3/lib/python3.7/site-packages/pytorch_lightning/trainer/trainer.py", line 777, in _fit_impl
    self._run(model, ckpt_path=ckpt_path)
  File "/Users/simonhuesgen/opt/anaconda3/lib/python3.7/site-packages/pytorch_lightning/trainer/trainer.py", line 1199, in _run
    self._dispatch()
  File "/Users/simonhuesgen/opt/anaconda3/lib/python3.7/site-packages/pytorch_lightning/trainer/trainer.py", line 1279, in _dispatch
    self.training_type_plugin.start_training(self)
  File "/Users/simonhuesgen/opt/anaconda3/lib/python3.7/site-packages/pytorch_lightning/plugins/training_type/training_type_plugin.py", line 202, in start_training
    self._results = trainer.run_stage()
  File "/Users/simonhuesgen/opt/anaconda3/lib/python3.7/site-packages/pytorch_lightning/trainer/trainer.py", line 1289, in run_stage
    return self._run_train()
  File "/Users/simonhuesgen/opt/anaconda3/lib/python3.7/site-packages/pytorch_lightning/trainer/trainer.py", line 1311, in _run_train
    self._run_sanity_check(self.lightning_module)
  File "/Users/simonhuesgen/opt/anaconda3/lib/python3.7/site-packages/pytorch_lightning/trainer/trainer.py", line 1375, in _run_sanity_check
    self._evaluation_loop.run()
  File "/Users/simonhuesgen/opt/anaconda3/lib/python3.7/site-packages/pytorch_lightning/loops/base.py", line 145, in run
    self.advance(*args, **kwargs)
  File "/Users/simonhuesgen/opt/anaconda3/lib/python3.7/site-packages/pytorch_lightning/loops/dataloader/evaluation_loop.py", line 110, in advance
    dl_outputs = self.epoch_loop.run(dataloader, dataloader_idx, dl_max_batches, self.num_dataloaders)
  File "/Users/simonhuesgen/opt/anaconda3/lib/python3.7/site-packages/pytorch_lightning/loops/base.py", line 145, in run
    self.advance(*args, **kwargs)
  File "/Users/simonhuesgen/opt/anaconda3/lib/python3.7/site-packages/pytorch_lightning/loops/epoch/evaluation_epoch_loop.py", line 122, in advance
    output = self._evaluation_step(batch, batch_idx, dataloader_idx)
  File "/Users/simonhuesgen/opt/anaconda3/lib/python3.7/site-packages/pytorch_lightning/loops/epoch/evaluation_epoch_loop.py", line 217, in _evaluation_step
    output = self.trainer.accelerator.validation_step(step_kwargs)
  File "/Users/simonhuesgen/opt/anaconda3/lib/python3.7/site-packages/pytorch_lightning/accelerators/accelerator.py", line 239, in validation_step
    return self.training_type_plugin.validation_step(*step_kwargs.values())
  File "/Users/simonhuesgen/opt/anaconda3/lib/python3.7/site-packages/pytorch_lightning/plugins/training_type/training_type_plugin.py", line 219, in validation_step
    return self.model.validation_step(*args, **kwargs)
  File "/Users/simonhuesgen/opt/anaconda3/lib/python3.7/site-packages/nni/retiarii/evaluator/pytorch/lightning.py", line 196, in validation_step
    y_hat = self(x)
  File "/Users/simonhuesgen/opt/anaconda3/lib/python3.7/site-packages/torch/nn/modules/module.py", line 1110, in _call_impl
    return forward_call(*input, **kwargs)
  File "/Users/simonhuesgen/opt/anaconda3/lib/python3.7/site-packages/nni/retiarii/evaluator/pytorch/lightning.py", line 182, in forward
    y_hat = self.model(x)
  File "/Users/simonhuesgen/opt/anaconda3/lib/python3.7/site-packages/torch/nn/modules/module.py", line 1110, in _call_impl
    return forward_call(*input, **kwargs)
  File "/Users/simonhuesgen/thesis/_generated_model/1HV7IN.py", line 73, in forward
    _fc1 = self._fc1(_relu15)
  File "/Users/simonhuesgen/opt/anaconda3/lib/python3.7/site-packages/torch/nn/modules/module.py", line 1110, in _call_impl
    return forward_call(*input, **kwargs)
  File "/Users/simonhuesgen/opt/anaconda3/lib/python3.7/site-packages/torch/nn/modules/linear.py", line 103, in forward
    return F.linear(input, self.weight, self.bias)
RuntimeError: mat1 and mat2 shapes cannot be multiplied (262144x8 and 16384x120)

I checked multiple times and cannot find out why it should end in different matrix shapes...

And finally (hopefully): I wanted to define a ValueChoice (in main) for the kernel_size of a conv layer and then adjusting the padding (in a block) by:


if kernel_size==1:
            padding = 0
        elif kernel_size==3:
            padding = 1

But again if-else is not the way to go. Any alternative solution?

I tried using a LayerChoice in the blocks like:


class Block2(nn.Module):
    def __init__(self, layer_size):
        super().__init__()
        self.conv1 = nn.LayerChoice([nn.Conv2d(3, layer_size, 5, stride=1,padding=2),nn.Conv2d(3, layer_size, 7, stride=1,padding=3)],label="Block1_2_LC")
        self.conv2 = nn.LayerChoice([nn.Conv2d(layer_size,layer_size*2, 5, stride=1,padding=2),nn.Conv2d(layer_size, layer_size*2, 7, stride=1,padding=3)],label="Block1_2_LC")
        self.pool = nn.MaxPool2d(2, 2)

    def forward(self, x):
        x = F.relu(self.conv1(x))
        x = F.relu(self.conv2(x))
        x = self.pool(x)
        return x

But it throws an AssertionError when running a test experiment:


[2022-04-29 12:43:30] INFO (nni.retiarii.experiment.pytorch/MainThread) Start strategy...
---------------------------------------------------------------------------
AssertionError                            Traceback (most recent call last)
<ipython-input-6-599a554cf87f> in <module>
----> 1 exp.run(exp_config, 8745)

~/opt/anaconda3/lib/python3.7/site-packages/nni/retiarii/experiment/pytorch.py in run(self, config, port, debug)
    314             assert config is not None, 'You are using classic search mode, config cannot be None!'
    315             self.config = config
--> 316             self.start(port, debug)
    317 
    318     def _check_exp_status(self) -> bool:

~/opt/anaconda3/lib/python3.7/site-packages/nni/retiarii/experiment/pytorch.py in start(self, port, debug)
    286         exp_status_checker = Thread(target=self._check_exp_status)
    287         exp_status_checker.start()
--> 288         self._start_strategy()
    289         # TODO: the experiment should be completed, when strategy exits and there is no running job
    290         _logger.info('Waiting for experiment to become DONE (you can ctrl+c if there is no running trial jobs)...')

~/opt/anaconda3/lib/python3.7/site-packages/nni/retiarii/experiment/pytorch.py in _start_strategy(self)
    210 
    211         _logger.info('Start strategy...')
--> 212         search_space = dry_run_for_formatted_search_space(base_model_ir, self.applied_mutators)
    213         self.update_search_space(search_space)
    214         self.strategy.run(base_model_ir, self.applied_mutators)

~/opt/anaconda3/lib/python3.7/site-packages/nni/retiarii/strategy/utils.py in dry_run_for_formatted_search_space(model, mutators)
     31     search_space = collections.OrderedDict()
     32     for mutator in mutators:
---> 33         recorded_candidates, model = mutator.dry_run(model)
     34         if len(recorded_candidates) == 1:
     35             search_space[mutator.label] = {'_type': 'choice', '_value': recorded_candidates[0]}

~/opt/anaconda3/lib/python3.7/site-packages/nni/retiarii/mutator.py in dry_run(self, model)
     87         recorder = _RecorderSampler()
     88         self.sampler = recorder
---> 89         new_model = self.apply(model)
     90         self.sampler = sampler_backup
     91         return recorder.recorded_candidates, new_model

~/opt/anaconda3/lib/python3.7/site-packages/nni/retiarii/mutator.py in apply(self, model)
     70         self._cur_samples = []
     71         self.sampler.mutation_start(self, copy)
---> 72         self.mutate(copy)
     73         self.sampler.mutation_end(self, copy)
     74         copy.history.append(Mutation(self, self._cur_samples, model, copy))

~/opt/anaconda3/lib/python3.7/site-packages/nni/retiarii/nn/pytorch/mutator.py in mutate(self, model)
    123 
    124             # update model with graph mutation primitives
--> 125             target = model.get_node_by_name(node.name)
    126             target.update_operation(target.operation.type, {**target.operation.parameters, argname: result_value})
    127 

~/opt/anaconda3/lib/python3.7/site-packages/nni/retiarii/graph.py in get_node_by_name(self, node_name)
    209             nodes = graph.get_nodes_by_name(node_name)
    210             matched_nodes.extend(nodes)
--> 211         assert len(matched_nodes) <= 1
    212         if matched_nodes:
    213             return matched_nodes[0]

AssertionError: 
SimonHuesgen commented 2 years ago

Sorry for the exceeding number of questions! I am trying to get used to implementing search spaces with nni and struggling a bit. I came across another question: Is there a way to access the chosen Layer from LayerChoice? e.g. if a future step of the model depends on which layer has been chosen

ultmaster commented 2 years ago

Your traceback looks awful. I can barely read. Could you wrap it with markdown syntax?

SimonHuesgen commented 2 years ago

Your traceback looks awful. I can barely read. Could you wrap it with markdown syntax?

Hope it is better now

matluster commented 2 years ago

nn.ValueChoice.condition(nn.ValueChoice([False, True], branch_a, branch_b))

Probably it should be nn.ValueChoice.condition(nn.ValueChoice([False, True]), branch_a, branch_b)

if kernel_size==1:
            padding = 0
        elif kernel_size==3:
            padding = 1

This is equivalent to:

kernel_size = nn.ValueChoice([1, 3])
padding = kernel_size // 2

for i in range(ValueChoice) way because it is a 'syntax sugar'. Is there an alternative to this?

I think the documentation made it clear that you should use nn.Repeat.

Are there any large search spaces already defined for CIFAR-10 I could use

Have you looked at space hub? Although I had almost no confidence in it...

But it throws an AssertionError when running a test experiment:

It's probably related to the layer_size here. What's your layer_size here?

Is there a way to access the chosen Layer from LayerChoice? e.g. if a future step of the model depends on which layer has been chosen

In the trial, when a layer choice is created, it's just the chosen layer. You can check the layer with isinstance.

sevenactors commented 1 year ago

maybe you can try "ModelParameterChoice" ,which according to the doc: "It’s quite similar to ValueChoice, but unlike ValueChoice, it always returns a fixed value, even at the construction of base model.

This makes it highly flexible (e.g., can be used in for-loop, if-condition, as argument of any function)"