MNIST import fails: cannot import name 'OpOverloadPacket' from 'torch._ops'

fonnesbeck commented 2 years ago

🐛 Describe the bug

Trying to import the MNIST dataset on Linux as follows:

import torchvision.datasets as datasets
mnist_trainset = datasets.MNIST(root='./data', train=True, download=True, transform=None)

fails with an ImportError:

---------------------------------------------------------------------------
ImportError                               Traceback (most recent call last)
/tmp/ipykernel_13067/1857463301.py in <module>
----> 1 import torchvision.datasets as datasets
      2 
      3 mnist_trainset = datasets.MNIST(root='./data', train=True, download=True, transform=None)

~/miniforge3/envs/bios8366/lib/python3.9/site-packages/torchvision/__init__.py in <module>
      5 from torchvision import datasets
      6 from torchvision import io
----> 7 from torchvision import models
      8 from torchvision import ops
      9 from torchvision import transforms

~/miniforge3/envs/bios8366/lib/python3.9/site-packages/torchvision/models/__init__.py in <module>
      1 from .alexnet import *
----> 2 from .convnext import *
      3 from .resnet import *
      4 from .vgg import *
      5 from .squeezenet import *

~/miniforge3/envs/bios8366/lib/python3.9/site-packages/torchvision/models/convnext.py in <module>
      8 from .._internally_replaced_utils import load_state_dict_from_url
      9 from ..ops.misc import ConvNormActivation
---> 10 from ..ops.stochastic_depth import StochasticDepth
     11 from ..utils import _log_api_usage_once
     12 

~/miniforge3/envs/bios8366/lib/python3.9/site-packages/torchvision/ops/__init__.py in <module>
     16 from .giou_loss import generalized_box_iou_loss
     17 from .misc import FrozenBatchNorm2d, SqueezeExcitation
---> 18 from .poolers import MultiScaleRoIAlign
     19 from .ps_roi_align import ps_roi_align, PSRoIAlign
     20 from .ps_roi_pool import ps_roi_pool, PSRoIPool

~/miniforge3/envs/bios8366/lib/python3.9/site-packages/torchvision/ops/poolers.py in <module>
      3 
      4 import torch
----> 5 import torch.fx
      6 import torchvision
      7 from torch import nn, Tensor

~/miniforge3/envs/bios8366/lib/python3.9/site-packages/torch/fx/__init__.py in <module>
     81 '''
     82 
---> 83 from .graph_module import GraphModule
     84 from ._symbolic_trace import symbolic_trace, Tracer, wrap, PH, ProxyableClassMeta
     85 from .graph import Graph

~/miniforge3/envs/bios8366/lib/python3.9/site-packages/torch/fx/graph_module.py in <module>
      6 import linecache
      7 from typing import Type, Dict, List, Any, Union, Optional, Set
----> 8 from .graph import Graph, _is_from_torch, _custom_builtins, PythonCode
      9 from ._compatibility import compatibility
     10 from torch.package import Importer, sys_importer

~/miniforge3/envs/bios8366/lib/python3.9/site-packages/torch/fx/graph.py in <module>
----> 1 from .node import Node, Argument, Target, map_arg, _type_repr, _get_qualified_name
      2 import torch.utils._pytree as pytree
      3 from . import _pytree as fx_pytree
      4 from ._compatibility import compatibility
      5 import contextlib

~/miniforge3/envs/bios8366/lib/python3.9/site-packages/torch/fx/node.py in <module>
      7 import types
      8 import warnings
----> 9 from torch.fx.operator_schemas import normalize_function, normalize_module, ArgsKwargsPair
     10 
     11 if TYPE_CHECKING:

~/miniforge3/envs/bios8366/lib/python3.9/site-packages/torch/fx/operator_schemas.py in <module>
      8 from torch._jit_internal import boolean_dispatched
      9 from ._compatibility import compatibility
---> 10 from torch._ops import OpOverloadPacket
     11 
     12 if TYPE_CHECKING:

ImportError: cannot import name 'OpOverloadPacket' from 'torch._ops' (/home/fonnesbeck/miniforge3/envs/bios8366/lib/python3.9/site-packages/torch/_ops.py)

Versions

PyTorch version: 1.11.0+cu102 Is debug build: False CUDA used to build PyTorch: 10.2 ROCM used to build PyTorch: N/A

OS: Debian GNU/Linux 11 (bullseye) (x86_64) GCC version: (Debian 10.2.1-6) 10.2.1 20210110 Clang version: Could not collect CMake version: Could not collect Libc version: glibc-2.31

Python version: 3.9.9 | packaged by conda-forge | (main, Dec 20 2021, 02:41:03) [GCC 9.4.0] (64-bit runtime) Python platform: Linux-5.10.102.1-microsoft-standard-WSL2-x86_64-with-glibc2.31 Is CUDA available: False CUDA runtime version: No CUDA GPU models and configuration: No CUDA Nvidia driver version: No CUDA cuDNN version: No CUDA HIP runtime version: N/A MIOpen runtime version: N/A Is XNNPACK available: True

Versions of relevant libraries: [pip3] numpy==1.21.5 [pip3] numpyro==0.9.1 [pip3] torch==1.11.0 [pip3] torchvision==0.12.0 [conda] Could not collect

pmeier commented 2 years ago

This seems to have nothing to do with the datasets, but rather with torch.fx that will be imported along the way. Could you confirm by

python -c "import torch.fx"

in your environment?

LukasMut commented 2 years ago

I get the same error. Seems to be an issue with either torch version 1.11.0 or torchvision version 0.12.0. If you install torch 1.10.0 and torchvision 0.11.0 instead, this error is not thrown. Please fix.

pmeier commented 2 years ago

I can't reproduce. @fonnesbeck @LukasMut does https://github.com/pytorch/vision/issues/5748#issuecomment-1089855905 work for you or does it also fail? Did you install torch / torchvision through pip or through conda?

The only "special" thing in the env is that you are using WSL. That shouldn't be a problem though. @LukasMut could you also post your env? You can do python -m torch.utils.collect_env if your torch installation works. Otherwise, please follow this procedure:

wget https://raw.githubusercontent.com/pytorch/pytorch/master/torch/utils/collect_env.py
# For security purposes, please check the contents of collect_env.py before running it.
python collect_env.py

XudongLinthu commented 2 years ago

Same problem here. My env is

PyTorch version: 1.11.0
Is debug build: False
CUDA used to build PyTorch: 10.2
ROCM used to build PyTorch: N/A
OS: Ubuntu 18.04.5 LTS (x86_64)
GCC version: (Ubuntu 7.5.0-3ubuntu1~18.04) 7.5.0
Clang version: Could not collect
CMake version: version 3.10.2
Libc version: glibc-2.17
Python version: 3.7.11 (default, Jul 27 2021, 14:32:16)  [GCC 7.5.0] (64-bit runtime)
Python platform: Linux-4.15.0-175-generic-x86_64-with-debian-buster-sid
Is CUDA available: True
CUDA runtime version: Could not collect
GPU models and configuration:
GPU 0: NVIDIA TITAN RTX
GPU 1: NVIDIA TITAN RTX
GPU 2: NVIDIA TITAN RTX
GPU 3: NVIDIA TITAN RTX
GPU 4: NVIDIA TITAN RTX
GPU 5: NVIDIA TITAN RTX
GPU 6: NVIDIA TITAN RTX
GPU 7: NVIDIA TITAN RTX
Nvidia driver version: 470.103.01
cuDNN version: /usr/local/cuda-10.1/cudnn-7.6.5/libcudnn.so.7.6.5
HIP runtime version: N/A
MIOpen runtime version: N/A
Versions of relevant libraries:
[pip3] numpy==1.19.1
[pip3] torch==1.11.0
[pip3] torchaudio==0.11.0
[pip3] torchvision==0.12.0
[conda] _pytorch_select     0.1         cpu_0
[conda] blas                1.0         mkl
[conda] cudatoolkit         10.2.89     hfd86e86_1
[conda] libmklml            2019.0.5    h06a4308_0
[conda] mkl                 2020.2      256
[conda] numpy               1.19.1      pypi_0                          pypi
[conda] pytorch             1.11.0      py3.7_cuda10.2_cudnn7.6.5_0     pytorch
[conda] pytorch-mutex       1.0         cuda                            pytorch
[conda] torch               1.9.0       pypi_0                          pypi
[conda] torchaudio          0.11.0      py37_cu102                      pytorch
[conda] torchvision         0.12.0      py37_cu102                      pytorch

pmeier commented 2 years ago

@XudongLinthu Your environment is broken:

[conda] pytorch             1.11.0      py3.7_cuda10.2_cudnn7.6.5_0     pytorch
...
[conda] torch               1.9.0       pypi_0                          pypi

My guess is that the functionality that we use from torch.fx is not available in torch==1.9.0. Please fix the environment and report back if the error persists.

soid commented 2 years ago

I had the same problem after just installing torch from pip. There was no pytorch listed in the output (see below). And I was getting exactly the same error "ImportError: cannot import name 'OpOverloadPacket' from 'torch._ops'"

I resolved it by going to https://pytorch.org/ and picking the "Compute Platform" CUDA since this machine had CUDA installed and reinstalling it (via pip but it had the extra option --extra-index-url).

Collecting environment information...
PyTorch version: 1.11.0+cpu
Is debug build: False
CUDA used to build PyTorch: Could not collect
ROCM used to build PyTorch: N/A

OS: Microsoft Windows 10 Home
GCC version: Could not collect
Clang version: Could not collect
CMake version: Could not collect
Libc version: N/A

Python version: 3.8.6 (tags/v3.8.6:db45529, Sep 23 2020, 15:52:53) [MSC v.1927 64 bit (AMD64)] (64-bit runtime)
Python platform: Windows-10-10.0.19041-SP0
Is CUDA available: False
CUDA runtime version: 10.1.243
GPU models and configuration: Could not collect
Nvidia driver version: Could not collect
cuDNN version: Could not collect
HIP runtime version: N/A
MIOpen runtime version: N/A
Is XNNPACK available: True

Versions of relevant libraries:
[pip3] mypy-extensions==0.4.3
[pip3] numpy==1.19.5
[pip3] torch==1.11.0
[pip3] torchaudio==0.11.0+cu113
[pip3] torchvision==0.12.0
[conda] Could not collect

pmeier commented 2 years ago

Hm, it might be related to the CPU only binaries on Windows? @soid can you confirm that https://github.com/pytorch/vision/issues/5748#issuecomment-1089855905 is the true source of the error? If yes, could you try the following three scenarios for me and report back?

PyPI

$ pip install torch
$ python -c "import torch.fx"

CPU (same as PyPI, but to make sure)

$ pip install torch --extra-index-url https://download.pytorch.org/whl/cpu
$ python -c "import torch.fx"

CUDA

$ pip install torch
$ python -c "import torch.fx" --extra-index-url https://download.pytorch.org/whl/cu113

Either do that in a clean env every time or at least run pip uninstall -y torch before each scenario.

lugi0 commented 1 year ago

I am running into this same issue while trying to from torchvision import datasets This is my env:

PyTorch version: 1.13.1+cu117
Is debug build: False
CUDA used to build PyTorch: 11.7
ROCM used to build PyTorch: N/A

OS: Red Hat Enterprise Linux 8.7 (Ootpa) (x86_64)
GCC version: (GCC) 8.5.0 20210514 (Red Hat 8.5.0-15)
Clang version: Could not collect
CMake version: version 3.20.2
Libc version: glibc-2.28

Python version: 3.8.13 (default, Jun 14 2022, 17:49:07)  [GCC 8.5.0 20210514 (Red Hat 8.5.0-13)] (64-bit runtime)
Python platform: Linux-4.18.0-372.36.1.el8_6.x86_64-x86_64-with-glibc2.2.5
Is CUDA available: True
CUDA runtime version: 11.4.152
CUDA_MODULE_LOADING set to: LAZY
GPU models and configuration: GPU 0: Tesla T4
Nvidia driver version: 525.60.13
cuDNN version: Probably one of the following:
/usr/lib64/libcudnn.so.8.2.4
/usr/lib64/libcudnn_adv_infer.so.8.2.4
/usr/lib64/libcudnn_adv_train.so.8.2.4
/usr/lib64/libcudnn_cnn_infer.so.8.2.4
/usr/lib64/libcudnn_cnn_train.so.8.2.4
/usr/lib64/libcudnn_ops_infer.so.8.2.4
/usr/lib64/libcudnn_ops_train.so.8.2.4
HIP runtime version: N/A
MIOpen runtime version: N/A
Is XNNPACK available: True

Versions of relevant libraries:
[pip3] numpy==1.19.2
[pip3] torch==1.13.1
[pip3] torchvision==0.14.1
[conda] Could not collect

pmeier commented 1 year ago

@lugi0 Have you tried the steps in https://github.com/pytorch/vision/issues/5748#issuecomment-1133909450? If this is indeed only related to FX, it is probably better to report this to PyTorch core, i.e. https://github.com/pytorch/pytorch directly.

lugi0 commented 1 year ago

@pmeier I've tried the third one (although not with cu113 but instead cu116 IIRC); I can give it another try but I'm unsure if that would fix it either way -- I am able to run the code in a different environment after doing a simple pip install torch torchvision, which might point to some other issue in my env?

pmeier commented 1 year ago

which might point to some other issue in my env?

I guess so. I don't think we ever established the root cause here. But all evidence points towards torch FX as culprit and we don't maintain it. The reason you see torchvision in the traceback is that we use it for some of our models. This is why I suggested to try python -c 'import torch.fx'. If that doesn't work and we are indeed looking at an env issue, you are much better off reporting it to PyTorch core. There the people maintaining FX can help you.

pytorch / vision

MNIST import fails: cannot import name 'OpOverloadPacket' from 'torch._ops' #5748

🐛 Describe the bug

Versions