[bug] Device dispatcher can choose CPU path for CUDA tensors.

nikitaved commented 2 years ago

🐛 Describe the bug

It is possible to confuse device dispatcher to choose wrong path when device dispatches are specified in a single line in native_functions.yaml.

To reproduce checkout the code from https://github.com/pytorch/pytorch/pull/79201 at https://github.com/pytorch/pytorch/pull/79201/commits/d03cfcfa406a75a5bcd11e0f6adeaa461b338e07. Then

In [1]: import torch

In [2]: ig = torch.rand(3, 3, device='cuda')

In [3]: ic = ig.to('cpu')

In [4]: res = torch.foo(ic, ic, ic)
CPU kernel

In [5]: res = torch.foo(ig, ig, ig)
CPU kernel
Segmentation fault (core dumped)

When dispatch keys are on separate lines, line in https://github.com/pytorch/pytorch/pull/79201/commits/9caf9267ecc5e56444cd1ad380a783b30159ee22, everything is fine.

Versions

PyTorch version: 1.13.0a0+git75a01c2
Is debug build: False
CUDA used to build PyTorch: 11.2
ROCM used to build PyTorch: N/A

OS: Ubuntu 20.04.4 LTS (x86_64)
GCC version: (GCC) 10.3.0
Clang version: Could not collect
CMake version: version 3.22.3
Libc version: glibc-2.31

Python version: 3.10.4 | packaged by conda-forge | (main, Mar 24 2022, 17:39:04) [GCC 10.3.0] (64-bit runtime)
Python platform: Linux-5.4.0-104-generic-x86_64-with-glibc2.31
Is CUDA available: True
CUDA runtime version: 11.4.120
GPU models and configuration: 
GPU 0: NVIDIA GeForce RTX 2060
GPU 1: NVIDIA GeForce RTX 2060

Nvidia driver version: 465.19.01
cuDNN version: Probably one of the following:
/usr/local/cuda-10.1.243/targets/x86_64-linux/lib/libcudnn.so.8
/usr/local/cuda-10.1.243/targets/x86_64-linux/lib/libcudnn_adv_infer.so.8
/usr/local/cuda-10.1.243/targets/x86_64-linux/lib/libcudnn_adv_train.so.8
/usr/local/cuda-10.1.243/targets/x86_64-linux/lib/libcudnn_cnn_infer.so.8
/usr/local/cuda-10.1.243/targets/x86_64-linux/lib/libcudnn_cnn_train.so.8
/usr/local/cuda-10.1.243/targets/x86_64-linux/lib/libcudnn_ops_infer.so.8
/usr/local/cuda-10.1.243/targets/x86_64-linux/lib/libcudnn_ops_train.so.8
/usr/local/cuda-10.2.89/targets/x86_64-linux/lib/libcudnn.so.8
/usr/local/cuda-10.2.89/targets/x86_64-linux/lib/libcudnn_adv_infer.so.8
/usr/local/cuda-10.2.89/targets/x86_64-linux/lib/libcudnn_adv_train.so.8
/usr/local/cuda-10.2.89/targets/x86_64-linux/lib/libcudnn_cnn_infer.so.8
/usr/local/cuda-10.2.89/targets/x86_64-linux/lib/libcudnn_cnn_train.so.8
/usr/local/cuda-10.2.89/targets/x86_64-linux/lib/libcudnn_ops_infer.so.8
/usr/local/cuda-10.2.89/targets/x86_64-linux/lib/libcudnn_ops_train.so.8
/usr/local/cuda-11.4.2/targets/x86_64-linux/lib/libcudnn.so.8
/usr/local/cuda-11.4.2/targets/x86_64-linux/lib/libcudnn_adv_infer.so.8
/usr/local/cuda-11.4.2/targets/x86_64-linux/lib/libcudnn_adv_train.so.8
/usr/local/cuda-11.4.2/targets/x86_64-linux/lib/libcudnn_cnn_infer.so.8
/usr/local/cuda-11.4.2/targets/x86_64-linux/lib/libcudnn_cnn_train.so.8
/usr/local/cuda-11.4.2/targets/x86_64-linux/lib/libcudnn_ops_infer.so.8
/usr/local/cuda-11.4.2/targets/x86_64-linux/lib/libcudnn_ops_train.so.8
HIP runtime version: N/A
MIOpen runtime version: N/A
Is XNNPACK available: False

Versions of relevant libraries:
[pip3] mypy==0.812
[pip3] mypy-extensions==0.4.3
[pip3] numpy==1.22.3
[pip3] torch==1.13.0a0+gitd03cfcf
[conda] magma-cuda112             2.5.2                         1    pytorch
[conda] mkl                       2022.0.1           h8d4b97c_803    conda-forge
[conda] mkl-include               2022.0.1           h8d4b97c_803    conda-forge
[conda] numpy                     1.22.3                   pypi_0    pypi
[conda] torch                     1.13.0a0+gitd03cfcf           dev_0    <develop>

cc @malfet @seemethere @ezyang @bhosmer @bdhirsh

bdhirsh commented 2 years ago

When the codegen sees this (from your PR https://github.com/pytorch/pytorch/pull/79201):

 dispatch:
    CPU, CUDA: foo

It interprets it as "there is a single at::native::foo kernel, that can be re-used by both the CPU and CUDA backend".

Although in your PR it looks like you defined at::native::foo twice, once in a cpp file and again in a cu file. I think I'd expect you to get a duplicate definition error? (But it sounds like you're not getting that, which seems strange)

nikitaved commented 2 years ago

Right, it compiles perfectly. And there are functions that have CPU, CUDA dispatches specified in a single line with the same name. angle, for example.

ezyang commented 2 years ago

Perhaps no duplicate symbol error because the defs go in separate libraries and then get hidden visibility. Not sure if we can actually fix this.

xsacha commented 2 years ago

Can't you have a static build in CI to trigger the duplicate symbol error?

ezyang commented 2 years ago

I guess... not sure if the CI team would want to maintain it though

pytorch / pytorch

[bug] Device dispatcher can choose CPU path for CUDA tensors. #79202

🐛 Describe the bug

Versions