Open SimonHuesgen opened 2 years ago
Does this work?
Another solution is that, you should probably use LayerChoice instead, to choose from BatchNorm and Identity.
Thanks for the quick reply! Using the condition to turn a ValueChoice Object to a bool object throws an error: keyword can't be an expression.
Using LayerChoice works for one layer, but it does not seem to work as a decision for multiple layers. What I mean is, that I want a binary choice which decides for multiple layers in the network if it will use BatchNorm or Identity. So I thought to try using the same label: self.layer3 = nn.LayerChoice([nn.BatchNorm2d(64),nn.Identity()], label="bn_choice") self.layer6 = nn.LayerChoice([nn.BatchNorm2d(128),nn.Identity()], label="bn_choice") but when test running experiments the trials fail immediately...
Any other ideas?
keyword can't be an expression.
I need more traceback on this error.
but when test running experiments the trials fail immediately...
I need stderr and trial's log for the failed trials.
ValueChoice.max(a, b)
to see whether that meets your needs.somehow I am not able to recreate the 'keyword can't be an expression' error right now...
/Users/simonhuesgen/opt/anaconda3/lib/python3.7/site-packages/pytorch_lightning/trainer/data_loading.py:133: UserWarning: The dataloader, val_dataloader 0, does not have many workers which may be a bottleneck. Consider increasing the value of the num_workers
argument(try 8 which is the number of cpus on this machine) in the
DataLoader` init to improve performance.
f"The dataloader, {name}, does not have many workers which may be a bottleneck."
Traceback (most recent call last):
File "/Users/simonhuesgen/opt/anaconda3/lib/python3.7/runpy.py", line 193, in _run_module_as_main
"main", mod_spec)
File "/Users/simonhuesgen/opt/anaconda3/lib/python3.7/runpy.py", line 85, in _run_code
exec(code, run_globals)
File "/Users/simonhuesgen/opt/anaconda3/lib/python3.7/site-packages/nni/retiarii/trial_entry.py", line 28, in
and trial.log look like this:
76.0 K Trainable params 0 Non-trainable params 76.0 K Total params 0.304 Total estimated model params size (MB) [2022-04-28 19:03:49] PRINT Validation sanity check: 0it [00:00, ?it/s] [2022-04-28 19:03:49] PRINT Validation sanity check: 0%| | 0/2 [00:00<?, ?it/s]
PS: Are there any larger examples (with more variety in possibilities of how to use different mutator functions) for search spaces than the example notebooks?
and later using bool_bn_choice in if-else still throws:
You should never use if-else branch. ValueChoice.condition works like tf.cond. You need to write something like:
branch_a = nn.Sequential(..., ...)
branch_b = nn.Sequential(..., ..., BN)
self.block = nn.ValueChoice.condition(nn.ValueChoice([False, True], branch_a, branch_b)
I don't know whether putting modules into nn.ValueChoice.condition
works, but you can try.
RuntimeError: only batches of spatial targets supported (3D tensors) but got targets of dimension: 1
Your output is a one-dimensional tensor, but you are trying to use pl.Classification
, if my guess is correct. Classification doesn't work like this.
Are there any larger examples (with more variety in possibilities of how to use different mutator functions) for search spaces than the example notebooks?
There is a space hub here. But it's in a preview state.
Thanks for all the help.
Trying to use it like this: branch_a = nn.Sequential(nn.Conv2d(64, 128, 3, stride=1,padding=1),nn.ReLU(),nn.MaxPool2d(2, 2)) branch_b = nn.Sequential(nn.Conv2d(64, 128, 3, stride=1,padding=1),nn.ReLU(),nn.BatchNorm2d(128),nn.MaxPool2d(2, 2)) self.block = nn.ValueChoice.condition(nn.ValueChoice([False, True], branch_a, branch_b))
throws:
---------------------------------------------------------------------------
TypeError Traceback (most recent call last)
<ipython-input-6-8cb293112ce7> in <module>
119
120
--> 121 model = Net()
~/opt/anaconda3/lib/python3.7/site-packages/nni/retiarii/serializer.py in __init__(self, *args, **kwargs)
123 self._model_namespace = ModelNamespace()
124 with self._model_namespace:
--> 125 super().__init__(*args, **kwargs)
126
127 _copy_class_wrapper_attributes(wrapper, reset_wrapper)
~/opt/anaconda3/lib/python3.7/site-packages/nni/common/serializer.py in new_init(self, *args, **kwargs)
429 self,
430 *[_argument_processor(arg) for arg in args],
--> 431 **{kw: _argument_processor(arg) for kw, arg in kwargs.items()}
432 )
433 inject_trace_info(self, base, args, kwargs)
<ipython-input-6-8cb293112ce7> in __init__(self)
104 branch_a = nn.Sequential(nn.Conv2d(64, 128, 3, stride=1,padding=1),nn.ReLU(),nn.MaxPool2d(2, 2))
105 branch_b = nn.Sequential(nn.Conv2d(64, 128, 3, stride=1,padding=1),nn.ReLU(),nn.BatchNorm2d(128),nn.MaxPool2d(2, 2))
--> 106 self.block = nn.ValueChoice.condition(nn.ValueChoice([False, True], branch_a, branch_b))
107 self.conv1 = nn.Conv2d(3, 64, 3, stride=1,padding=1)
108 self.conv2 = nn.Conv2d(128, 256, 3, stride=1,padding=1)
~/opt/anaconda3/lib/python3.7/site-packages/nni/retiarii/nn/pytorch/mutation_utils.py in __new__(cls, *args, **kwargs)
22
23 try:
---> 24 return cls.create_fixed_module(*args, **kwargs)
25 except NoContextError:
26 return super().__new__(cls)
TypeError: create_fixed_module() takes 2 positional arguments but 4 were given
Another related question: I am aware that in the documentation for ValueChoice it is specified that you can not use it in a for i in range(ValueChoice) way because it is a 'syntax sugar'. Is there an alternative to this? Specifically I want the search space to have a number of blocks to choose from (like in ResNets or NasNet) but use different number and sequences of them. A way I thought I could make that happen is by using for i in range(ValueChoice[1,2,3,4]) and then add blocks accordingly... Is there a way to achieve what I just described?
PS: Are there any large search spaces already defined for CIFAR-10 I could use (with the TPE strategy)?
Also, do you have any idea why this model:
class ReLUConvBN(nn.Sequential):
def __init__(self, in_size, out_size, kernel_size, stride, padding):
super().__init__(
nn.ReLU(),
nn.Conv2d(in_size, out_size, kernel_size, stride=stride,padding=padding),
nn.BatchNorm2d(out_size))
class ReLUConv(nn.Sequential):
def __init__(self, in_size, out_size, kernel_size, stride, padding):
super().__init__(
nn.ReLU(),
nn.Conv2d(in_size, out_size, kernel_size, stride=stride,padding=padding))
@model_wrapper
class Net(nn.Module):
def __init__(self):
super().__init__()
self.block = nn.LayerChoice([ReLUConvBN(64,128,3,1,1),ReLUConv(64,128,3,1,1)])
self.conv1 = nn.Conv2d(3, 64, 3, stride=1,padding=1)
self.conv2 = nn.Conv2d(128, 256, 3, stride=1,padding=1)
self.pool = nn.MaxPool2d(2, 2)
self.fc1 = nn.Linear(256*8*8,120)
self.fc2 = nn.Linear(120,10)
def forward(self, x):
x = F.relu(self.conv1(x))
x = self.pool(x)
x = self.block(x)
x = self.pool(x)
x = F.relu(self.conv2(x))
x = self.fc1(x)
x = self.fc2(x)
return x
model = Net()
throws this error in stderr:
/Users/simonhuesgen/opt/anaconda3/lib/python3.7/site-packages/pytorch_lightning/trainer/data_loading.py:133: UserWarning: The dataloader, val_dataloader 0, does not have many workers which may be a bottleneck. Consider increasing the value of the `num_workers` argument` (try 8 which is the number of cpus on this machine) in the `DataLoader` init to improve performance.
f"The dataloader, {name}, does not have many workers which may be a bottleneck."
Traceback (most recent call last):
File "/Users/simonhuesgen/opt/anaconda3/lib/python3.7/runpy.py", line 193, in _run_module_as_main
"__main__", mod_spec)
File "/Users/simonhuesgen/opt/anaconda3/lib/python3.7/runpy.py", line 85, in _run_code
exec(code, run_globals)
File "/Users/simonhuesgen/opt/anaconda3/lib/python3.7/site-packages/nni/retiarii/trial_entry.py", line 28, in <module>
engine.trial_execute_graph()
File "/Users/simonhuesgen/opt/anaconda3/lib/python3.7/site-packages/nni/retiarii/execution/base.py", line 146, in trial_execute_graph
graph_data.evaluator._execute(model_cls)
File "/Users/simonhuesgen/opt/anaconda3/lib/python3.7/site-packages/nni/retiarii/evaluator/pytorch/lightning.py", line 119, in _execute
return self.fit(model_cls)
File "/Users/simonhuesgen/opt/anaconda3/lib/python3.7/site-packages/nni/retiarii/evaluator/pytorch/lightning.py", line 148, in fit
return self.trainer.fit(self.module, self.train_dataloader, self.val_dataloaders)
File "/Users/simonhuesgen/opt/anaconda3/lib/python3.7/site-packages/pytorch_lightning/trainer/trainer.py", line 741, in fit
self._fit_impl, model, train_dataloaders, val_dataloaders, datamodule, ckpt_path
File "/Users/simonhuesgen/opt/anaconda3/lib/python3.7/site-packages/pytorch_lightning/trainer/trainer.py", line 685, in _call_and_handle_interrupt
return trainer_fn(*args, **kwargs)
File "/Users/simonhuesgen/opt/anaconda3/lib/python3.7/site-packages/pytorch_lightning/trainer/trainer.py", line 777, in _fit_impl
self._run(model, ckpt_path=ckpt_path)
File "/Users/simonhuesgen/opt/anaconda3/lib/python3.7/site-packages/pytorch_lightning/trainer/trainer.py", line 1199, in _run
self._dispatch()
File "/Users/simonhuesgen/opt/anaconda3/lib/python3.7/site-packages/pytorch_lightning/trainer/trainer.py", line 1279, in _dispatch
self.training_type_plugin.start_training(self)
File "/Users/simonhuesgen/opt/anaconda3/lib/python3.7/site-packages/pytorch_lightning/plugins/training_type/training_type_plugin.py", line 202, in start_training
self._results = trainer.run_stage()
File "/Users/simonhuesgen/opt/anaconda3/lib/python3.7/site-packages/pytorch_lightning/trainer/trainer.py", line 1289, in run_stage
return self._run_train()
File "/Users/simonhuesgen/opt/anaconda3/lib/python3.7/site-packages/pytorch_lightning/trainer/trainer.py", line 1311, in _run_train
self._run_sanity_check(self.lightning_module)
File "/Users/simonhuesgen/opt/anaconda3/lib/python3.7/site-packages/pytorch_lightning/trainer/trainer.py", line 1375, in _run_sanity_check
self._evaluation_loop.run()
File "/Users/simonhuesgen/opt/anaconda3/lib/python3.7/site-packages/pytorch_lightning/loops/base.py", line 145, in run
self.advance(*args, **kwargs)
File "/Users/simonhuesgen/opt/anaconda3/lib/python3.7/site-packages/pytorch_lightning/loops/dataloader/evaluation_loop.py", line 110, in advance
dl_outputs = self.epoch_loop.run(dataloader, dataloader_idx, dl_max_batches, self.num_dataloaders)
File "/Users/simonhuesgen/opt/anaconda3/lib/python3.7/site-packages/pytorch_lightning/loops/base.py", line 145, in run
self.advance(*args, **kwargs)
File "/Users/simonhuesgen/opt/anaconda3/lib/python3.7/site-packages/pytorch_lightning/loops/epoch/evaluation_epoch_loop.py", line 122, in advance
output = self._evaluation_step(batch, batch_idx, dataloader_idx)
File "/Users/simonhuesgen/opt/anaconda3/lib/python3.7/site-packages/pytorch_lightning/loops/epoch/evaluation_epoch_loop.py", line 217, in _evaluation_step
output = self.trainer.accelerator.validation_step(step_kwargs)
File "/Users/simonhuesgen/opt/anaconda3/lib/python3.7/site-packages/pytorch_lightning/accelerators/accelerator.py", line 239, in validation_step
return self.training_type_plugin.validation_step(*step_kwargs.values())
File "/Users/simonhuesgen/opt/anaconda3/lib/python3.7/site-packages/pytorch_lightning/plugins/training_type/training_type_plugin.py", line 219, in validation_step
return self.model.validation_step(*args, **kwargs)
File "/Users/simonhuesgen/opt/anaconda3/lib/python3.7/site-packages/nni/retiarii/evaluator/pytorch/lightning.py", line 196, in validation_step
y_hat = self(x)
File "/Users/simonhuesgen/opt/anaconda3/lib/python3.7/site-packages/torch/nn/modules/module.py", line 1110, in _call_impl
return forward_call(*input, **kwargs)
File "/Users/simonhuesgen/opt/anaconda3/lib/python3.7/site-packages/nni/retiarii/evaluator/pytorch/lightning.py", line 182, in forward
y_hat = self.model(x)
File "/Users/simonhuesgen/opt/anaconda3/lib/python3.7/site-packages/torch/nn/modules/module.py", line 1110, in _call_impl
return forward_call(*input, **kwargs)
File "/Users/simonhuesgen/thesis/_generated_model/1HV7IN.py", line 73, in forward
_fc1 = self._fc1(_relu15)
File "/Users/simonhuesgen/opt/anaconda3/lib/python3.7/site-packages/torch/nn/modules/module.py", line 1110, in _call_impl
return forward_call(*input, **kwargs)
File "/Users/simonhuesgen/opt/anaconda3/lib/python3.7/site-packages/torch/nn/modules/linear.py", line 103, in forward
return F.linear(input, self.weight, self.bias)
RuntimeError: mat1 and mat2 shapes cannot be multiplied (262144x8 and 16384x120)
I checked multiple times and cannot find out why it should end in different matrix shapes...
And finally (hopefully): I wanted to define a ValueChoice (in main) for the kernel_size of a conv layer and then adjusting the padding (in a block) by:
if kernel_size==1:
padding = 0
elif kernel_size==3:
padding = 1
But again if-else is not the way to go. Any alternative solution?
I tried using a LayerChoice in the blocks like:
class Block2(nn.Module):
def __init__(self, layer_size):
super().__init__()
self.conv1 = nn.LayerChoice([nn.Conv2d(3, layer_size, 5, stride=1,padding=2),nn.Conv2d(3, layer_size, 7, stride=1,padding=3)],label="Block1_2_LC")
self.conv2 = nn.LayerChoice([nn.Conv2d(layer_size,layer_size*2, 5, stride=1,padding=2),nn.Conv2d(layer_size, layer_size*2, 7, stride=1,padding=3)],label="Block1_2_LC")
self.pool = nn.MaxPool2d(2, 2)
def forward(self, x):
x = F.relu(self.conv1(x))
x = F.relu(self.conv2(x))
x = self.pool(x)
return x
But it throws an AssertionError when running a test experiment:
[2022-04-29 12:43:30] INFO (nni.retiarii.experiment.pytorch/MainThread) Start strategy...
---------------------------------------------------------------------------
AssertionError Traceback (most recent call last)
<ipython-input-6-599a554cf87f> in <module>
----> 1 exp.run(exp_config, 8745)
~/opt/anaconda3/lib/python3.7/site-packages/nni/retiarii/experiment/pytorch.py in run(self, config, port, debug)
314 assert config is not None, 'You are using classic search mode, config cannot be None!'
315 self.config = config
--> 316 self.start(port, debug)
317
318 def _check_exp_status(self) -> bool:
~/opt/anaconda3/lib/python3.7/site-packages/nni/retiarii/experiment/pytorch.py in start(self, port, debug)
286 exp_status_checker = Thread(target=self._check_exp_status)
287 exp_status_checker.start()
--> 288 self._start_strategy()
289 # TODO: the experiment should be completed, when strategy exits and there is no running job
290 _logger.info('Waiting for experiment to become DONE (you can ctrl+c if there is no running trial jobs)...')
~/opt/anaconda3/lib/python3.7/site-packages/nni/retiarii/experiment/pytorch.py in _start_strategy(self)
210
211 _logger.info('Start strategy...')
--> 212 search_space = dry_run_for_formatted_search_space(base_model_ir, self.applied_mutators)
213 self.update_search_space(search_space)
214 self.strategy.run(base_model_ir, self.applied_mutators)
~/opt/anaconda3/lib/python3.7/site-packages/nni/retiarii/strategy/utils.py in dry_run_for_formatted_search_space(model, mutators)
31 search_space = collections.OrderedDict()
32 for mutator in mutators:
---> 33 recorded_candidates, model = mutator.dry_run(model)
34 if len(recorded_candidates) == 1:
35 search_space[mutator.label] = {'_type': 'choice', '_value': recorded_candidates[0]}
~/opt/anaconda3/lib/python3.7/site-packages/nni/retiarii/mutator.py in dry_run(self, model)
87 recorder = _RecorderSampler()
88 self.sampler = recorder
---> 89 new_model = self.apply(model)
90 self.sampler = sampler_backup
91 return recorder.recorded_candidates, new_model
~/opt/anaconda3/lib/python3.7/site-packages/nni/retiarii/mutator.py in apply(self, model)
70 self._cur_samples = []
71 self.sampler.mutation_start(self, copy)
---> 72 self.mutate(copy)
73 self.sampler.mutation_end(self, copy)
74 copy.history.append(Mutation(self, self._cur_samples, model, copy))
~/opt/anaconda3/lib/python3.7/site-packages/nni/retiarii/nn/pytorch/mutator.py in mutate(self, model)
123
124 # update model with graph mutation primitives
--> 125 target = model.get_node_by_name(node.name)
126 target.update_operation(target.operation.type, {**target.operation.parameters, argname: result_value})
127
~/opt/anaconda3/lib/python3.7/site-packages/nni/retiarii/graph.py in get_node_by_name(self, node_name)
209 nodes = graph.get_nodes_by_name(node_name)
210 matched_nodes.extend(nodes)
--> 211 assert len(matched_nodes) <= 1
212 if matched_nodes:
213 return matched_nodes[0]
AssertionError:
Sorry for the exceeding number of questions! I am trying to get used to implementing search spaces with nni and struggling a bit. I came across another question: Is there a way to access the chosen Layer from LayerChoice? e.g. if a future step of the model depends on which layer has been chosen
Your traceback looks awful. I can barely read. Could you wrap it with markdown syntax?
Your traceback looks awful. I can barely read. Could you wrap it with markdown syntax?
Hope it is better now
nn.ValueChoice.condition(nn.ValueChoice([False, True], branch_a, branch_b))
Probably it should be nn.ValueChoice.condition(nn.ValueChoice([False, True]), branch_a, branch_b)
if kernel_size==1:
padding = 0
elif kernel_size==3:
padding = 1
This is equivalent to:
kernel_size = nn.ValueChoice([1, 3])
padding = kernel_size // 2
for i in range(ValueChoice) way because it is a 'syntax sugar'. Is there an alternative to this?
I think the documentation made it clear that you should use nn.Repeat
.
Are there any large search spaces already defined for CIFAR-10 I could use
Have you looked at space hub? Although I had almost no confidence in it...
But it throws an AssertionError when running a test experiment:
It's probably related to the layer_size
here. What's your layer_size
here?
Is there a way to access the chosen Layer from LayerChoice? e.g. if a future step of the model depends on which layer has been chosen
In the trial, when a layer choice is created, it's just the chosen layer. You can check the layer with isinstance
.
maybe you can try "ModelParameterChoice" ,which according to the doc: "It’s quite similar to ValueChoice, but unlike ValueChoice, it always returns a fixed value, even at the construction of base model.
This makes it highly flexible (e.g., can be used in for-loop, if-condition, as argument of any function)"
Describe the issue: Hello there,
Here I am trying to define a search space using nni v2.7 and I came across a concern. It happened multiple times that I wanted to define mutations using a ValueChoice object and followingly using it in an if-else-clause. This throws an error every time, saying this is not the intended use of ValueChoice. One simplified example would be the choice of using BatchNormalization; I attempted: ''' bn_choice = nn.ValueChoice([0,1], label="bn_choice") if bn_choice==1: self.bn = nn.BatchNorm2d(128) ''' in init and ''' if self.bn: x = self.bn(x) ''' in forward
Is there an alternative way to define this?
Thanks in advance!
Environment:
Configuration:
Log message:
How to reproduce it?: