torch.jit.trace() caused TracerWarning

MichaelMonashev commented 3 years ago

Describe the bug

import torch
import timm

model = timm.create_model('tf_efficientnetv2_b0').cuda().eval()
images = torch.rand(1, 3, 128, 128).cuda()
jit_traced_model = torch.jit.trace(model, (images,))
/home/xxxxxx/.local/lib/python3.8/site-packages/timm/models/layers/padding.py:19: TracerWarning: Converting a tensor to a Python float might cause the trace to be incorrect. We can't record the data flow of Python values, so this value will be treated as a constant in the future. This means that the trace might not generalize to other inputs!
  return max((math.ceil(x / s) - 1) * s + (k - 1) * d + 1 - x, 0)
/home/xxxxxx/.local/lib/python3.8/site-packages/timm/models/layers/padding.py:19: TracerWarning: Converting a tensor to a Python boolean might cause the trace to be incorrect. We can't record the data flow of Python values, so this value will be treated as a constant in the future. This means that the trace might not generalize to other inputs!
  return max((math.ceil(x / s) - 1) * s + (k - 1) * d + 1 - x, 0)
/home/xxxxxx/.local/lib/python3.8/site-packages/timm/models/layers/padding.py:31: TracerWarning: Converting a tensor to a Python boolean might cause the trace to be incorrect. We can't record the data flow of Python values, so this value will be treated as a constant in the future. This means that the trace might not generalize to other inputs!
  if pad_h > 0 or pad_w > 0:
/home/xxxxxx/.local/lib/python3.8/site-packages/timm/models/layers/padding.py:32: UserWarning: __floordiv__ is deprecated, and its behavior will change in a future version of pytorch. It currently rounds toward 0 (like the 'trunc' function NOT 'floor'). This results in incorrect rounding for negative values. To keep the current behavior, use torch.div(a, b, rounding_mode='trunc'), or for actual floor division, use torch.div(a, b, rounding_mode='floor').
  x = F.pad(x, [pad_w // 2, pad_w - pad_w // 2, pad_h // 2, pad_h - pad_h // 2], value=value)

Desktop (please complete the following information):

OS: Linux 5.4.0-80-generic 90-Ubuntu SMP Fri Jul 9 22:49:44 UTC 2021 x86_64 x86_64 x86_64 GNU/Linux
This repository version timm 0.4.13
PyTorch version w/ CUDA/cuDNN torch 1.10.0+cu113 torchaudio 0.10.0+cu113 torchio 0.18.46 torchvision 0.11.1+cu113 Driver Version: 470.57.02 CUDA Version: 11.4

rwightman commented 3 years ago

@MichaelMonashev I'm aware of that, but don't think anything can be done in that case other than not using the models with TF same padding emulation. You'll basically end up with a model that fixes the padding for one image size, which is unavoidable based on the way tracing and the padding 'hack' works. Scripting will work with the padding but tracing will make the padding constant for the example input you use.

MichaelMonashev commented 3 years ago

I bencmarked jit scripting and tracing on various models and input sizes on GPU. Traced model usually faster than scripted one. But this warning makes it look unsafe for end users. May be adding comment after return max((math.ceil(x / s) - 1) * s + (k - 1) * d + 1 - x, 0) and if pad_h > 0 or pad_w > 0: will help?

MichaelMonashev commented 3 years ago

Some benchmark results: https://pastebin.com/raw/yFWJAPqL https://pastebin.com/raw/Z9AbcmAb https://pastebin.com/raw/C5g4Pefc

rwightman commented 2 years ago

Lots of situations where this warning crops up, overall I'm not very pleased with the way this is handled in tracing and think PyTorch should / could do much better, especially the floordiv issue.

I don't know how much a comment would help as reason is complex and there actually is a problem with some of those warnings if you trace a model with that warning and expect it to work well at a different input resolution.

albertz commented 1 year ago

I assume the warning was because of the max? I assume using torch.max would already fix the warning for the bool conversion? And the other warning was because of math.ceil? I assume using torch.ceil should also fix that warning.

Also, for ceildiv, anyway this code would be better:

def ceildiv(a, b):
    return -(-a // b)

SWHL commented 7 months ago

The latest code has been implemented as shown below: https://github.com/huggingface/pytorch-image-models/blob/67b0b3d7c7da3dbd76f30375b086ba4a0656811f/timm/layers/padding.py#L19-L23

huggingface / pytorch-image-models

torch.jit.trace() caused TracerWarning #943