microsoft / nni

An open source AutoML toolkit for automate machine learning lifecycle, including feature engineering, neural architecture search, model compression and hyper-parameter tuning.
https://nni.readthedocs.io
MIT License
14.07k stars 1.82k forks source link

one-shot strategy model scores/ WebUI #4795

Closed NotSure2732 closed 2 years ago

NotSure2732 commented 2 years ago

Describe the issue: Regarding the use of DARTS/ and other one-shot strategies: Is it only possible to retrieve the best architecture, or is there a way to get a ranking of the different choices/ architectures, or even a WebUI?

Environment:

Configuration:

Log message:

How to reproduce it?:

matluster commented 2 years ago

I assume for most of the algorithms, the "ranking" of the architectures is implicit. For DARTS, you might use the multiplication of architecture weights, or accuracy of sub-net. For ENAS, you might use the log-probability of RL agents, or sub-net accuracy, or just directly sample a few architectures as top-k.

As this is ad-hoc and vague, I'd prefer not to expose such ability, at least not publicly. If you're interested in the details, you can always hack the strategy and look inside.

NotSure2732 commented 2 years ago

Thanks!

As this is ad-hoc and vague, I'd prefer not to expose such ability, at least not publicly. If you're interested in the details, you can always hack the strategy and look inside.

Does that mean the only way to get these values e.g. for DARTS would be like this? or How would one get the values of the multiplication of architecture weights, or the accuracy of the sub-nets?

matluster commented 2 years ago

This is another question highly related to algorithm.

Taking DARTS as an example, in DARTS implementation, you would see an alpha, which is a parameter dict of architecture weights. Typically, we select top-1 by taking the maximum element in each individual dimension (see export). For top-k, I think you can use some beam search or dynamic programming to get those. This is what I mean by multiplication of architecture weights.

Accuracy/loss of sub-nets is another story. Fix one path and do the inference (it's like what people do in SPOS). Then calculate the loss or accuracy. To find the top-k accuracy is a bit more tricky. Maybe we can use some heuristics like evolution search (it's also what people do in SPOS).

Again, I think not many people have really done this, neither in research nor in industry field. Therefore, anything reasonable could be correct. Shared above is only my humble opinion.

NotSure2732 commented 2 years ago

Thank you for the quick replies and insights!

I also made another issue (#4671 ), regarding the output of DARTS but is has been closed before so I will ask the same questions here:

Running DARTS on the following model (search space):

from nni.retiarii.serializer import model_wrapper import torch.nn.functional as F import nni.retiarii.nn.pytorch as nn

class Block1(nn.Module): def init(self, layer_size): super().init() self.conv1 = nn.Conv2d(3, layer_size, 3, stride=1,padding=1) self.conv2 = nn.Conv2d(layer_size, layer_size*2, 3, stride=1, padding=1)

def forward(self, x): x = F.relu(self.conv1(x)) x = F.relu(self.conv2(x)) return x class Block2(nn.Module): def init(self, layer_size): super().init() self.conv1 = nn.Conv2d(3, layer_size*2, 3, stride=1,padding=1)

def forward(self, x): x = F.relu(self.conv1(x)) return x class Block3(nn.Module): def init(self, layer_size): super().init() self.conv1 = nn.Conv2d(layer_size2, layer_size4, 3, stride=1,padding=1) self.conv2 = nn.Conv2d(layer_size4, layer_size8, 3, stride=1, padding=1)

def forward(self, x): x = F.relu(self.conv1(x)) x = F.relu(self.conv2(x)) return x class Block4(nn.Module): def init(self, layer_size): super().init() self.conv1 = nn.Conv2d(layer_size2, layer_size8, 3, stride=1,padding=1)

def forward(self, x): x = F.relu(self.conv1(x)) return x @model_wrapper class Net(nn.Module): def init(self): super().init() rand_var = nn.ValueChoice([32,64]) self.conv1 = nn.LayerChoice([Block1(rand_var),Block2(rand_var)]) self.conv2 = nn.LayerChoice([Block3(rand_var),Block4(rand_var)]) self.pool = nn.MaxPool2d(2, 2) self.conv3 = nn.Conv2d(rand_var8,rand_var16 , 3, stride=1, padding=1) self.fc1 = nn.Linear(rand_var168*8, 120) self.fc2 = nn.Linear(120, 84) self.fc3 = nn.Linear(84, 10)

def forward(self, x):

x = self.conv1(x)
x = self.pool(x)
x = self.conv2(x)
x = self.pool(x)
x = F.relu(self.conv3(x))  
x = x.reshape(x.shape[0],-1)
x = F.relu(self.fc1(x))
x = F.relu(self.fc2(x))
x = self.fc3(x)
return x

model = Net()

resulted in the following result:

Final architecture: {'model_2': '1', 'model_3': '1'}

The model (search space) includes 3 parameters which can vary, but the final architecture only displays 2 final choices. Is there something wrong with the way I expressed the model space or could there be a different issue? Since the result explicitly names 'model_2' and 'model_3' as parameters, 'model_1' would be the ValueChoice parameter...

Help is highly appreciated!

ultmaster commented 2 years ago

I've replied in the other issue.