microsoft / nni

An open source AutoML toolkit for automate machine learning lifecycle, including feature engineering, neural architecture search, model compression and hyper-parameter tuning.
https://nni.readthedocs.io
MIT License
13.99k stars 1.81k forks source link

Model FLOPs/Parameters Counter Example #3344

Closed listener17 closed 3 years ago

listener17 commented 3 years ago

The example given here: https://nni.readthedocs.io/en/stable/Compression/CompressionUtils.html#model-flops-parameters-counter

Is not working on both Windows and Linux.

QuanluZhang commented 3 years ago

@colorjam please help take this issue, thanks

colorjam commented 3 years ago

@listener17 Hi, could you please provide more details about your running code? What model did you define to feed into count_flops_params?

listener17 commented 3 years ago

Hi @colorjam:

Thanks.

First, I think there is a bug in the documentation. Running the following gives error.

>>> from nni.compression.pytorch.utils.counter import count_flops_param
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
ImportError: cannot import name 'count_flops_param' from 'nni.compression.pytorch.utils.counter' (C:\Users\abc\Anaconda3\envs\pytorch\lib\site-packages\nni\compression\pytorch\utils\counter.py)

It should be

from nni.compression.pytorch.utils import counter
flops, params, results = counter.count_flops_params(model, (1, 1, 28, 28))

Yesterday I updated to PyTorch 1.7.1 and upgraded NNI and started playing with my old code. Previously, everything was working but not now. But, I tested that latest NNI works for resnet50

from torchvision.models import resnet50
from thop import profile
model = resnet50()
input = torch.randn(1, 3, 224, 224)

So, the problem could be with my model or it could be something related to latest PyTorch.

listener17 commented 3 years ago

@colorjam:

I tried my model with the pytorch-OpCounter: https://github.com/Lyken17/pytorch-OpCounter

It works (like before!. But, NNI's FLOP counter doesn't work now. Previously, both were working and I think they both game same results.

Meaning, now, pytorch-OpCounter works:

x = torch.randn(1, 1, 16384)
macs, params = profile(G, inputs=(x, ))

NNI doesn't work: flops, params, results = counter.count_flops_params(G, (x,))

I get this error with NNI:

Traceback (most recent call last):
  File "c:\git\seMono\se\models\generator.py", line 756, in <module>
    flops, params, results = counter.count_flops_params(G, (x,)) # tuple of tensor as input
  File "C:\Users\abc\Anaconda3\envs\pytorch\lib\site-packages\nni\compression\pytorch\utils\counter.py", line 279, in count_flops_params
    model(*x)
  File "C:\Users\abc\Anaconda3\envs\pytorch\lib\site-packages\torch\nn\modules\module.py", line 727, in _call_impl
    result = self.forward(*input, **kwargs)
  File "c:\git\seMono\se\models\generator.py", line 284, in forward
    hi, linear_hi = Elayer(hi, True)
  File "C:\Users\abc\Anaconda3\envs\pytorch\lib\site-packages\torch\nn\modules\module.py", line 727, in _call_impl
    result = self.forward(*input, **kwargs)
  File "c:\git\seMono\se\models\modules.py", line 194, in forward
    a = self.conv(x_p)
  File "C:\Users\abc\Anaconda3\envs\pytorch\lib\site-packages\torch\nn\modules\module.py", line 731, in _call_impl
    hook_result = hook(self, input, result)
  File "C:\Users\abc\Anaconda3\envs\pytorch\lib\site-packages\nni\compression\pytorch\utils\counter.py", line 161, in count_module
    result = self.ops[type(m)](m, x, y)
  File "C:\Users\abc\Anaconda3\envs\pytorch\lib\site-packages\nni\compression\pytorch\utils\counter.py", line 89, in _count_convNd
    kernel_ops = m.weight.size()[2] * m.weight.size()[3]
IndexError: tuple index out of range

For your info, my model has 1d convolution, 1d transposed convolution, and LSTMs.

I have not yet tested on Linux GPU server, only testing on my Windows laptop. My environment is:

Collecting environment information...
PyTorch version: 1.7.1
Is debug build: False
CUDA used to build PyTorch: 10.2
ROCM used to build PyTorch: N/A

OS: Microsoft Windows 10 Enterprise
GCC version: Could not collect
Clang version: Could not collect
CMake version: Could not collect

Python version: 3.8 (64-bit runtime)
Is CUDA available: True
CUDA runtime version: 10.2.89
GPU models and configuration: GPU 0: Quadro T2000
Nvidia driver version: 452.56
cuDNN version: Could not collect
HIP runtime version: N/A
MIOpen runtime version: N/A

Versions of relevant libraries:
[pip3] numpy==1.19.3
[pip3] torch==1.7.1
[pip3] torchaudio==0.7.2
[pip3] torchvision==0.8.2
[conda] numpy                     1.19.3                   pypi_0    pypi
[conda] torch                     1.7.1                    pypi_0    pypi
[conda] torchaudio                0.7.2                    pypi_0    pypi
[conda] torchvision               0.8.2                    pypi_0    pypi
colorjam commented 3 years ago

@listener17:

Thanks for the valuable feedback! Currently, we not integrate LSTM layers into our counter. We will fix it in 2.1 version. Would you please kindly show us the code snippet of the G?

listener17 commented 3 years ago

Hi @colorjam:

You may for example simply try running generator.py https://github.com/santi-pdp/segan_pytorch/blob/master/segan/models/generator.py

Please add on top: from nni.compression.pytorch.utils import counter

and, please add at the last line: flops, params, results = counter.count_flops_params(G, (x,))

You will get the same error I reported. It's simply a model with 1d conv and 1d transposed conv with skip connection (like in an U-Net).

Thanks for looking into it.