one-shot strategy model scores/ WebUI

NotSure2732 commented 2 years ago

Describe the issue: Regarding the use of DARTS/ and other one-shot strategies: Is it only possible to retrieve the best architecture, or is there a way to get a ranking of the different choices/ architectures, or even a WebUI?

Environment:

NNI version:
Training service (local|remote|pai|aml|etc):
Client OS:
Server OS (for remote mode only):
Python version:
PyTorch/TensorFlow version:
Is conda/virtualenv/venv used?:
Is running in Docker?:

Configuration:

Experiment config (remember to remove secrets!):
Search space:

Log message:

nnimanager.log:
dispatcher.log:
nnictl stdout and stderr:

How to reproduce it?:

matluster commented 2 years ago

I assume for most of the algorithms, the "ranking" of the architectures is implicit. For DARTS, you might use the multiplication of architecture weights, or accuracy of sub-net. For ENAS, you might use the log-probability of RL agents, or sub-net accuracy, or just directly sample a few architectures as top-k.

As this is ad-hoc and vague, I'd prefer not to expose such ability, at least not publicly. If you're interested in the details, you can always hack the strategy and look inside.

NotSure2732 commented 2 years ago

Thanks!

As this is ad-hoc and vague, I'd prefer not to expose such ability, at least not publicly. If you're interested in the details, you can always hack the strategy and look inside.

Does that mean the only way to get these values e.g. for DARTS would be like this? or How would one get the values of the multiplication of architecture weights, or the accuracy of the sub-nets?

matluster commented 2 years ago

This is another question highly related to algorithm.

Taking DARTS as an example, in DARTS implementation, you would see an alpha, which is a parameter dict of architecture weights. Typically, we select top-1 by taking the maximum element in each individual dimension (see export). For top-k, I think you can use some beam search or dynamic programming to get those. This is what I mean by multiplication of architecture weights.

Accuracy/loss of sub-nets is another story. Fix one path and do the inference (it's like what people do in SPOS). Then calculate the loss or accuracy. To find the top-k accuracy is a bit more tricky. Maybe we can use some heuristics like evolution search (it's also what people do in SPOS).

Again, I think not many people have really done this, neither in research nor in industry field. Therefore, anything reasonable could be correct. Shared above is only my humble opinion.

NotSure2732 commented 2 years ago

Thank you for the quick replies and insights!

I also made another issue (#4671 ), regarding the output of DARTS but is has been closed before so I will ask the same questions here:

Running DARTS on the following model (search space):

from nni.retiarii.serializer import model_wrapper import torch.nn.functional as F import nni.retiarii.nn.pytorch as nn

class Block1(nn.Module): def init(self, layer_size): super().init() self.conv1 = nn.Conv2d(3, layer_size, 3, stride=1,padding=1) self.conv2 = nn.Conv2d(layer_size, layer_size*2, 3, stride=1, padding=1)

def forward(self, x): x = F.relu(self.conv1(x)) x = F.relu(self.conv2(x)) return x class Block2(nn.Module): def init(self, layer_size): super().init() self.conv1 = nn.Conv2d(3, layer_size*2, 3, stride=1,padding=1)

def forward(self, x): x = F.relu(self.conv1(x)) return x class Block3(nn.Module): def init(self, layer_size): super().init() self.conv1 = nn.Conv2d(layer_size2, layer_size4, 3, stride=1,padding=1) self.conv2 = nn.Conv2d(layer_size4, layer_size8, 3, stride=1, padding=1)

def forward(self, x): x = F.relu(self.conv1(x)) x = F.relu(self.conv2(x)) return x class Block4(nn.Module): def init(self, layer_size): super().init() self.conv1 = nn.Conv2d(layer_size2, layer_size8, 3, stride=1,padding=1)

def forward(self, x): x = F.relu(self.conv1(x)) return x @model_wrapper class Net(nn.Module): def init(self): super().init() rand_var = nn.ValueChoice([32,64]) self.conv1 = nn.LayerChoice([Block1(rand_var),Block2(rand_var)]) self.conv2 = nn.LayerChoice([Block3(rand_var),Block4(rand_var)]) self.pool = nn.MaxPool2d(2, 2) self.conv3 = nn.Conv2d(rand_var8,rand_var16 , 3, stride=1, padding=1) self.fc1 = nn.Linear(rand_var168*8, 120) self.fc2 = nn.Linear(120, 84) self.fc3 = nn.Linear(84, 10)

def forward(self, x):

x = self.conv1(x)
x = self.pool(x)
x = self.conv2(x)
x = self.pool(x)
x = F.relu(self.conv3(x))  
x = x.reshape(x.shape[0],-1)
x = F.relu(self.fc1(x))
x = F.relu(self.fc2(x))
x = self.fc3(x)
return x

model = Net()

resulted in the following result:

Final architecture: {'model_2': '1', 'model_3': '1'}

The model (search space) includes 3 parameters which can vary, but the final architecture only displays 2 final choices. Is there something wrong with the way I expressed the model space or could there be a different issue? Since the result explicitly names 'model_2' and 'model_3' as parameters, 'model_1' would be the ValueChoice parameter...

Help is highly appreciated!

ultmaster commented 2 years ago

I've replied in the other issue.

microsoft / nni

one-shot strategy model scores/ WebUI #4795