pytorch / ao

PyTorch native quantization and sparsity for training and inference
BSD 3-Clause "New" or "Revised" License
669 stars 87 forks source link

Linux specific import issue "name 'int32' is not defined" #535

Closed guangy10 closed 1 month ago

guangy10 commented 1 month ago

Env:

In ExecuTorch https://github.com/pytorch/executorch/pull/4320, I ran into this error when trying to consolidate the setup for llava. The same setup works just fine on Mac. So it is a Linux specific import issue.

See the stacktrace:

/home/guangyang/.conda/envs/executorch/lib/python3.10/site-packages/executorch/exir/passes/_quant_patterns_and_replacements.py:106: FutureWarning: `torch.library.impl_abstract` was renamed to `torch.library.register_fake`. Please use that instead; we will remove `torch.library.impl_abstract` in a future version of PyTorch.
  @impl_abstract("quantized_decomposed::embedding_byte.out")
/home/guangyang/.conda/envs/executorch/lib/python3.10/site-packages/executorch/exir/passes/_quant_patterns_and_replacements.py:153: FutureWarning: `torch.library.impl_abstract` was renamed to `torch.library.register_fake`. Please use that instead; we will remove `torch.library.impl_abstract` in a future version of PyTorch.
  @impl_abstract("quantized_decomposed::embedding_byte.dtype_out")
/home/guangyang/.conda/envs/executorch/lib/python3.10/site-packages/executorch/exir/passes/_quant_patterns_and_replacements.py:228: FutureWarning: `torch.library.impl_abstract` was renamed to `torch.library.register_fake`. Please use that instead; we will remove `torch.library.impl_abstract` in a future version of PyTorch.
  @impl_abstract("quantized_decomposed::embedding_4bit.out")
/home/guangyang/.conda/envs/executorch/lib/python3.10/site-packages/executorch/exir/passes/_quant_patterns_and_replacements.py:281: FutureWarning: `torch.library.impl_abstract` was renamed to `torch.library.register_fake`. Please use that instead; we will remove `torch.library.impl_abstract` in a future version of PyTorch.
  @impl_abstract("quantized_decomposed::embedding_4bit.dtype_out")
WARNING:bitsandbytes.cextension:The installed version of bitsandbytes was compiled without GPU support. 8-bit optimizers, 8-bit multiplication, and GPU quantization are unavailable.
You are using a model of type llava to instantiate a model of type llava_llama. This is not supported for all configurations of models and can yield errors.
Loading checkpoint shards: 100%|███████████████████████████████████████████████████████████████████████████████████████████████████| 2/2 [00:00<00:00,  4.40it/s]
INFO:root:Loading custom ops library: /home/guangyang/.conda/envs/executorch/lib/python3.10/site-packages/executorch/examples/models/llama2/custom_ops/libcustom_ops_aot_lib.so
/home/guangyang/.conda/envs/executorch/lib/python3.10/site-packages/torch/export/_unlift.py:59: UserWarning: Attempted to insert a get_attr Node with no underlying reference in the owning GraphModule! Call GraphModule.add_submodule to add the necessary submodule, GraphModule.add_parameter to add the necessary Parameter, or nn.Module.register_buffer to add the necessary buffer
  getattr_node = gm.graph.get_attr(lifted_node)
/home/guangyang/.conda/envs/executorch/lib/python3.10/site-packages/torch/fx/graph.py:1570: UserWarning: Node lifted_tensor_0 target lifted_tensor_0 lifted_tensor_0 of  does not reference an nn.Module, nn.Parameter, or buffer, which is what 'get_attr' Nodes typically target
  warnings.warn(f'Node {node} target {node.target} {atom} of {seen_qualname} does '
/home/guangyang/.conda/envs/executorch/lib/python3.10/site-packages/torch/fx/graph.py:1570: UserWarning: Node lifted_tensor_1 target lifted_tensor_1 lifted_tensor_1 of  does not reference an nn.Module, nn.Parameter, or buffer, which is what 'get_attr' Nodes typically target
  warnings.warn(f'Node {node} target {node.target} {atom} of {seen_qualname} does '
/home/guangyang/.conda/envs/executorch/lib/python3.10/site-packages/torch/fx/graph.py:1570: UserWarning: Node lifted_tensor_2 target lifted_tensor_2 lifted_tensor_2 of  does not reference an nn.Module, nn.Parameter, or buffer, which is what 'get_attr' Nodes typically target
  warnings.warn(f'Node {node} target {node.target} {atom} of {seen_qualname} does '
INFO:root:Applying quantizers: []
Traceback (most recent call last):
  File "/data/users/guangyang/executorch/examples/models/llava/export_llava.py", line 201, in <module>
    main()
  File "/data/users/guangyang/executorch/examples/models/llava/export_llava.py", line 173, in main
    text_model_ep = export_text_model(
  File "/data/users/guangyang/executorch/examples/models/llava/export_llava.py", line 84, in export_text_model
    .source_transform([replace_sdpa_with_custom_op, quant_transform])
  File "/home/guangyang/.conda/envs/executorch/lib/python3.10/site-packages/executorch/extension/llm/export/builder.py", line 127, in source_transform
    self.model = transform(self.model)
  File "/home/guangyang/.conda/envs/executorch/lib/python3.10/site-packages/executorch/examples/models/llama2/source_transformation/quantize.py", line 75, in quantize
    from torchao.quantization.quant_api import Int8DynActInt4WeightQuantizer
  File "/home/guangyang/.conda/envs/executorch/lib/python3.10/site-packages/torchao/__init__.py", line 2, in <module>
    from .quantization.quant_api import apply_dynamic_quant
  File "/home/guangyang/.conda/envs/executorch/lib/python3.10/site-packages/torchao/quantization/__init__.py", line 7, in <module>
    from .smoothquant import *  # noqa: F403
  File "/home/guangyang/.conda/envs/executorch/lib/python3.10/site-packages/torchao/quantization/smoothquant.py", line 18, in <module>
    import torchao.quantization.quant_api as quant_api
  File "/home/guangyang/.conda/envs/executorch/lib/python3.10/site-packages/torchao/quantization/quant_api.py", line 22, in <module>
    from .dynamic_quant import DynamicallyPerAxisQuantizedLinear
  File "/home/guangyang/.conda/envs/executorch/lib/python3.10/site-packages/torchao/quantization/dynamic_quant.py", line 10, in <module>
    from .quant_primitives import (
  File "/home/guangyang/.conda/envs/executorch/lib/python3.10/site-packages/torchao/quantization/quant_primitives.py", line 13, in <module>
    from torchao.kernel.intmm import int_scaled_matmul
  File "/home/guangyang/.conda/envs/executorch/lib/python3.10/site-packages/torchao/kernel/intmm.py", line 10, in <module>
    from torchao.kernel import intmm_triton
  File "/home/guangyang/.conda/envs/executorch/lib/python3.10/site-packages/torchao/kernel/intmm_triton.py", line 165, in <module>
    def scaled_matmul_kernel_with_block_pointers(
  File "/home/guangyang/.conda/envs/executorch/lib/python3.10/site-packages/triton/runtime/jit.py", line 542, in jit
    return decorator(fn)
  File "/home/guangyang/.conda/envs/executorch/lib/python3.10/site-packages/triton/runtime/jit.py", line 534, in decorator
    return JITFunction(
  File "/home/guangyang/.conda/envs/executorch/lib/python3.10/site-packages/triton/runtime/jit.py", line 433, in __init__
    self.run = self._make_launcher()
  File "/home/guangyang/.conda/envs/executorch/lib/python3.10/site-packages/triton/runtime/jit.py", line 400, in _make_launcher
    exec(src, scope)
  File "<string>", line 2, in <module>
NameError: name 'int32' is not defined
jerryzh168 commented 1 month ago

@msaroufim can you take a look?

guangy10 commented 1 month ago

@msaroufim Would these lines be the issue? Feels like something isn't imported correctly for OSS.

msaroufim commented 1 month ago

The problem seems to be that you're using torch cpu binaries with torchao gpu binaries, on pypi we default to cuda 12.1

Could you please try the below instead?

pip install torchao --extra-index-url https://download.pytorch.org/whl/cpu --force-reinstall

EDIT: I just tried this on my end and couldn't repro

(ao) [marksaroufim@devgpu003.cco3 ~]$ pip list | grep torch
pytorch-triton           3.0.0+dedb7bdf33
torch                    2.5.0.dev20240716+cpu
torchao                  0.3.1                 /home/marksaroufim/anaconda3/envs/ao/lib/python3.10/site-packages
(ao) [marksaroufim@devgpu003.cco3 ~]$ python
Python 3.10.14 (main, May  6 2024, 19:42:50) [GCC 11.2.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> from torchao.kernel import intmm_triton
>>> 

It's likely a linux specific issue because triton isn't installed or supported on mac so ensure that the version of pytorch is what you expect it to be and that triton is not installed. The specific error you're then hitting is probably due to having some old unsupported version of triton installed

guangy10 commented 1 month ago

The problem seems to be that you're using torch cpu binaries with torchao gpu binaries, on pypi we default to cuda 12.1

Could you please try the below instead?

pip install torchao --extra-index-url https://download.pytorch.org/whl/cpu --force-reinstall

EDIT: I just tried this on my end and couldn't repro

(ao) [marksaroufim@devgpu003.cco3 ~]$ pip list | grep torch
pytorch-triton           3.0.0+dedb7bdf33
torch                    2.5.0.dev20240716+cpu
torchao                  0.3.1                 /home/marksaroufim/anaconda3/envs/ao/lib/python3.10/site-packages
(ao) [marksaroufim@devgpu003.cco3 ~]$ python
Python 3.10.14 (main, May  6 2024, 19:42:50) [GCC 11.2.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> from torchao.kernel import intmm_triton
>>> 

It's likely a linux specific issue because triton isn't installed or supported on mac so ensure that the version of pytorch is what you expect it to be and that triton is not installed. The specific error you're then hitting is probably due to having some old unsupported version of triton installed

Yeah. Ok, so the triton is installed by the third-party/llava. And the installed version is 2.1.1, which is old. Let me try updating it to 3.0.0 instead.

msaroufim commented 1 month ago

Ok looks like this issue is unblocked, feel free to reopen if something else comes up

guangy10 commented 1 month ago

Thank you @msaroufim! the remaining issue in that PR is irrelevant. We can close this issue now.