pytorch / pytorch

Tensors and Dynamic neural networks in Python with strong GPU acceleration
https://pytorch.org
Other
80.16k stars 21.55k forks source link

Spectral Normalization can not be applied to Conv{1,2,3}d #99149

Open fabiobrau opened 1 year ago

fabiobrau commented 1 year ago

🐛 Describe the bug

I would like to raise a concern about the spectral_norm parameterization.

I strongly believe that Spectral-Normalization Parameterization introduced several versions ago does not work for Conv{1,2,3}d layers.

The reason is that reshaping the weight into a 2D is not enough. An easy fix could be obtained by rescaling through a scale factor of 1/(k1*k2)**0.5 the parameterized weights, where k1, k2 are the dimensions of the kernel filters. However, note that only should solve the problem for the case stride=1.

By running the following code

  import torch
  from torch.nn.utils.parametrizations import spectral_norm

  original_conv = torch.nn.Conv2d(200, 200, 7, padding=7//2, bias=False)
  lip_conv = spectral_norm(original_conv, n_power_iterations=40)

  with torch.no_grad():
      # Initializing original_conv weight to higher values
      for p in original_conv.parameters():
          p += 10*torch.rand(200, 200, 7, 7)

  # Estimating the Lipschitz constant on several inputs
  lip_cons = 0.
  for _ in range(100):
      dim = torch.randint(7, 100, [1]).item()
      x = 100*torch.rand(1, 200, dim, dim)
      lip_cons = max(lip_cons, lip_conv(x).norm(2)/x.norm(2))

  print(f'Lipschitz constant of lip_conv is greater than 1: {lip_cons}')

  # Power method applied to the weights of the lip_conv
  w = lip_conv.weight
  x = torch.randn(1, 200, 250, 250)
  for _ in range(40):
      x = torch.nn.functional.conv2d(x, w, padding=7//2,)
      sigma = x.norm(2)
      x /= sigma
  print(f'Estimated Lipschitz constant of lip_conv: {sigma} ~ 7')
  print()

you should obtain something like

Lipschitz constant of lip_conv is greater than 1: 5.880033016204834
Esitmated Lipschitz constant of lip_conv: 6.902995586395264 ~ 7

There is something I am doing wrong? If this is not the case, I would like to ask the developers, to properly notify in the Documentation that this feature is not valid for convolutional layers. This could avoid misuse of this implementation.

Thanks for your time, Fabio

Versions

Collecting environment information... PyTorch version: 2.0.0 Is debug build: False CUDA used to build PyTorch: 11.7 ROCM used to build PyTorch: N/A

OS: Ubuntu 18.04.4 LTS (x86_64) GCC version: (Ubuntu 7.5.0-3ubuntu1~18.04) 7.5.0 Clang version: Could not collect CMake version: version 3.10.2 Libc version: glibc-2.27

Python version: 3.10.10 (main, Mar 21 2023, 18:45:11) [GCC 11.2.0] (64-bit runtime) Python platform: Linux-4.15.0-76-generic-x86_64-with-glibc2.27 Is CUDA available: True CUDA runtime version: 10.1.243 CUDA_MODULE_LOADING set to: LAZY GPU models and configuration: GPU 0: Tesla V100-DGXS-32GB GPU 1: Tesla V100-DGXS-32GB GPU 2: Tesla V100-DGXS-32GB GPU 3: Tesla V100-DGXS-32GB

Nvidia driver version: 515.65.01 cuDNN version: /usr/lib/x86_64-linux-gnu/libcudnn.so.7.6.5 HIP runtime version: N/A MIOpen runtime version: N/A Is XNNPACK available: True

CPU: Architecture: x86_64 CPU op-mode(s): 32-bit, 64-bit Byte Order: Little Endian CPU(s): 40 On-line CPU(s) list: 0-39 Thread(s) per core: 2 Core(s) per socket: 20 Socket(s): 1 NUMA node(s): 1 Vendor ID: GenuineIntel CPU family: 6 Model: 79 Model name: Intel(R) Xeon(R) CPU E5-2698 v4 @ 2.20GHz Stepping: 1 CPU MHz: 1485.012 CPU max MHz: 3600,0000 CPU min MHz: 1200,0000 BogoMIPS: 4397.23 Virtualization: VT-x L1d cache: 32K L1i cache: 32K L2 cache: 256K L3 cache: 51200K NUMA node0 CPU(s): 0-39 Flags: fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx pdpe1gb rdtscp lm constant_tsc arch_perfmon pebs bts rep_good nopl xtopology nonstop_tsc cpuid aperfmperf pni pclmulqdq dtes64 ds_cpl vmx smx est tm2 ssse3 sdbg fma cx16 xtpr pdcm pcid dca sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand lahf_lm abm 3dnowprefetch cpuid_fault epb cat_l3 cdp_l3 invpcid_single pti intel_ppin ssbd ibrs ibpb stibp tpr_shadow vnmi flexpriority ept vpid fsgsbase tsc_adjust bmi1 hle avx2 smep bmi2 erms invpcid rtm cqm rdt_a rdseed adx smap intel_pt xsaveopt cqm_llc cqm_occup_llc cqm_mbm_total cqm_mbm_local dtherm ida arat pln pts md_clear flush_l1d

Versions of relevant libraries: [pip3] numpy==1.23.5 [pip3] torch==2.0.0 [pip3] torchvision==0.15.0 [pip3] triton==2.0.0 [conda] blas 1.0 mkl
[conda] ffmpeg 4.3 hf484d3e_0 pytorch [conda] mkl 2021.4.0 h06a4308_640
[conda] mkl-service 2.4.0 py310h7f8727e_0
[conda] mkl_fft 1.3.1 py310hd6ae3a3_0
[conda] mkl_random 1.2.2 py310h00e6091_0
[conda] numpy 1.23.5 py310hd5efca6_0
[conda] numpy-base 1.23.5 py310h8e6c178_0
[conda] pytorch 2.0.0 py3.10_cuda11.7_cudnn8.5.0_0 pytorch [conda] pytorch-cuda 11.7 h778d358_3 pytorch [conda] pytorch-mutex 1.0 cuda pytorch [conda] torchtriton 2.0.0 py310 pytorch [conda] torchvision 0.15.0 py310_cu117 pytorch

cc @albanD @mruberry @jbschlosser @walterddr @mikaylagawarecki

redwrasse commented 4 months ago

Any progress on this? A spectral norm implementation that doesn't support convolutional layers would be odd, given the original paper https://arxiv.org/pdf/1802.05957.pdf.

redwrasse commented 4 months ago

Also, the original spectral norm implementation was based on the GAN paper, though the spectral norm has rather broader applicability, and other research paper implementations- here's one for adversarial robustness and implementation using Fourier (another reason it should support convolutional operators) methods: https://arxiv.org/pdf/2103.13815.pdf

Without taking sides, from that paper: '... Spectral normalization [2] has gained significant attention because it is algorithmagnostic and can reduce DNNs’ sensitivity to input perturbation. A major challenge with this method is that it provides a trade-off between computation time and numerical accuracy Figure 1: Example adversarial attack on autonomous driving for computing the spectral norm of DNNs’ weight matrix. Specifically, it applies one iteration of power method for each individual layer, which achieves poor accuracy in most cases. Moreover, it introduces high computation overhead when dealing with large convolution kernels. In reality, power iteration may not numerically converge to the desired result in some specific scenarios.'

Might be worth revisiting if the current implementation and supporting documentation as given are appropriate for Pytorch as a general-purpose library?

nic-barbara commented 2 months ago

Does anyone know if there's been any progress on this?

nic-barbara commented 1 month ago

If anyone is interested, I ended up writing my own version for 2d Convolutions: https://github.com/nic-barbara/Lipschitz-RL-Atari/blob/main/liprl/networks/specnorm_conv2d.py

The code is based on the TensorFlow implementation found here.

redwrasse commented 1 week ago

@nic-barbara do you get different spectral norm results on convolutional layers than the current pytorch implementation? Maybe the place to start is agreeing on/adding convolutional layer test cases that need to pass for spectral norm.

nic-barbara commented 1 week ago

@redwrasse yep I do. The method used to calculate the spectral norm for convolutional layers in PyTorch is not correct, so it's not just a code-level bug. The current PyTorch implementation just flattens the convolution kernel and computes the spectral norm of a now 2D array the same way as for a dense/linear layer. However, it should actually apply the convolution properly as part of the calculation.

There's a nice implementation in TensorFlow here for 2D convolutions, I reckon that's a good place to start: https://github.com/google/edward2/blob/main/edward2/tensorflow/layers/normalization.py#L398

fabiobrau commented 1 week ago

Hello @nic-barbara and @redwrasse,

If you are looking to implement spectral normalization for Lipschitz-bounded neural networks, you might find our repository 1LipschitzLayersCompared useful. The module model.layers contains a nearly complete list of Lipschitz-bounded layers documented in the literature.

Me and my co-author @berndprach have conducted a comprehensive comparison of these layers in our recent paper 1-Lipschitz Layers Compared: Memory, Speed, and Certifiable Robustness which has been accepted at CVPR-2024. This resource could provide valuable insights and practical implementations for your work.

If you need additional layers or have suggestions, we are looking forward to keeping the repository updated. Feel free to explore the repository and reach out if you have any questions or need further information!

Best, Fabio Brau

nic-barbara commented 1 week ago

Hey @fabiobrau, thanks for the links! I have actually read your paper before and found it very insightful. Maybe we can chat elsewhere, it's pretty closely connected with some observations that I made in my recent paper, On Robust Reinforcement Learning with Lipschitz Bounded Policy Networks (note that for RL, the networks are quite small so vanishing gradients are not a big problem).

I do have a question regarding your implementation though, which relates to the PyTorch version too: do you use a different approach for convolutions as opposed to dense layers? Eg: how do you take into account things like stride?