peteryuX / pcdarts-tf2

PC-DARTS (PC-DARTS: Partial Channel Connections for Memory-Efficient Differentiable Architecture Search, published in ICLR 2020) implemented in Tensorflow 2.0+. This is an unofficial implementation.
MIT License
27 stars 8 forks source link

How to derive the final architecture? (question for PyTorch version) #1

Closed xiangzhluo closed 4 years ago

xiangzhluo commented 4 years ago

Hi Kuan-Yu,

Sorry about bothering you.

After I search on the CIFAR10 dataset, I get one type of genotype, which is familiar with the reported case in the original paper.

However, when I derive the final architecture and calculate the FLOPs and latency, it seems a little strange.

For example, I run:

from model import NetworkCIFAR as Network
import genotypes

genotype = genotype = eval("genotypes.%s" % "PCDARTS")

with torch.cuda.device(0):
    model = Network(36, 1000, 14, True, genotype)
    model.drop_path_prob = 0.3
    model.eval()
    flops, params =  get_model_complexity_info(model, (3, 224, 224), as_strings=True, print_per_layer_stat=True)
    print("{:<30}  {:<8}".format("Computational complexity: ", flops))
    print("{:<30}  {:<8}".format("Number of parameters: ", params))

The reported model complexity and the number of parameters for the searched genotypes (with 14 layers under ImageNet setting with image size 224x224) are as follows:

Computational complexity:       20.11 GMac
Number of parameters:           4.3 M  

But when I run resnet50 for comparison:

from torchvision.models import resnet50

with torch.cuda.device(0):
    model = resnet50(pretrained=False)
    flops, params = get_model_complexity_info(model, (3, 224, 224), as_strings=True,
                                             print_per_layer_stat=True)
    print('{:<30}  {:<8}'.format('Computational complexity: ', flops))
    print('{:<30}  {:<8}'.format('Number of parameters: ', params))

The reported model complexity and the number of parameters for resnet50 are as follows:

Computational complexity:       4.12 GMac
Number of parameters:           25.56 M 

The reported FLOPs in the original paper on ImageNet setting is only 597M. It seems there is something wrong with my derived final architecture (but I am sure the searched genotype is definitely correct). Here I mean after I get the searched model, I want to deploy it on some hardware devices. The latency for the searched genotype (with 14 layers) is nearly ten times as the resnet50, which is unacceptable.

At your convenience, could you help to give clarifications about how to derive the final architecture with a searched genotype? For the future, I will consider to deploy the searched model on some hardware devices and try to add some hardware-aware constraints for optimizing the overall design.

Although this is not in tensorflow, the idea is similar. I just want to figure out how to correctly export the final searched models (here I mean the stack of several genotypes) and then apply those models in other tasks.

Thanks for your time and have a nice day!

peteryuX commented 4 years ago

Did you use ptflops.get_model_complexity_info to calculate the Flops? I saw even the oldest version, the requirements of it is Pytorch 0.4.1 or 1.0, torchvision 0.2.1 but the official PC-DARTS was implemented on pytorch(0.3)...

Is that a problem?

As I know, the official PC-DARTS will be OOM in the newer PyTorch version in here. Maybe that's the same reason. The operators are not equal mechanism in different versions.

xiangzhluo commented 4 years ago

Did you use ptflops.get_model_complexity_info to calculate the Flops? I saw even the oldest version, the requirements of it is Pytorch 0.4.1 or 1.0, torchvision 0.2.1 but the official PC-DARTS was implemented on pytorch(0.3)...

Is that a problem?

As I know, the official PC-DARTS will be OOM in the newer PyTorch version in here. Maybe that's the same reason. The operators are not equal mechanism in different versions.

Thanks for your suggestion.

I just checked that it is not caused by the PyTorch version. For the OOM, it is mainly because the newer PyTorch deprecated the volatile and volatile=True does not work anymore. So once you want to assign a parameter without gradient you can use with torch.no_grad() instead. At the very beginning, I have corrected it.

After I get the searched genotype, I want to stack some cells or layers to make a new model (like residual block and resnet50). Usually, in neural architecture search, our aim is to get a simple and small model while maintaining the same level of accuracy, which is suitable for deployment. But the latency (10 times as resnet50) and model complexity (12000M FLOPs) seems a little strange under the ImageNet setting (8 cells). Since the reported FLOPs under ImageNet setting is about 597M FLOPs.

As you can see in the following test codes, I set the output classes = 1000, init_channels=36, cells=14 with the searched genotype, which is the same as the original paper.

from model import NetworkCIFAR as Network
import genotypes

genotype = genotype = eval("genotypes.%s" % "PCDARTS")

with torch.cuda.device(0):
    model = Network(36, 1000, 14, True, genotype)
    model.drop_path_prob = 0.3
    model.eval()
    flops, params =  get_model_complexity_info(model, (3, 224, 224), as_strings=True, print_per_layer_stat=True)
    print("{:<30}  {:<8}".format("Computational complexity: ", flops))
    print("{:<30}  {:<8}".format("Number of parameters: ", params))

I will continue to figure it out.

peteryuX commented 4 years ago
from model import NetworkImageNet as Network

Is the model imported from different way? You can check it from here. The feature maps resolution in layers are totally different in each model. HAHAHA...

xiangzhluo commented 4 years ago

Yes, you are right! I got it!!!

Thanks for your gentle help. Have a nice day. Hahahaha