Open mattiadg opened 1 month ago
It looks like torch._int_mm
is not compiled for CUDA on windows, at least for sm75
. I have no issues using it on Linux with a NVIDIA T4.
so this is something for the torch team?
By the way, this is the error on cpu
Traceback (most recent call last):
File "C:\Users\matti\PycharmProjects\optimum-quanto\examples\speech\speech_recognition\quantize_asr_model.py", line 128, in <module>
main()
File "C:\Users\matti\PycharmProjects\optimum-quanto\examples\speech\speech_recognition\quantize_asr_model.py", line 112, in main
evaluate_model(model, processor, processed_dataset, wer, args.batch_size)
File "C:\Users\matti\PycharmProjects\optimum-quanto\examples\speech\speech_recognition\quantize_asr_model.py", line 54, in evaluate_model
result = dataset.map(map_fn, batched=True, batch_size=batch_size)
File "C:\Users\matti\PycharmProjects\optimum-quanto\.venv\lib\site-packages\datasets\arrow_dataset.py", line 602, in wrapper
out: Union["Dataset", "DatasetDict"] = func(self, *args, **kwargs)
File "C:\Users\matti\PycharmProjects\optimum-quanto\.venv\lib\site-packages\datasets\arrow_dataset.py", line 567, in wrapper
out: Union["Dataset", "DatasetDict"] = func(self, *args, **kwargs)
File "C:\Users\matti\PycharmProjects\optimum-quanto\.venv\lib\site-packages\datasets\arrow_dataset.py", line 3161, in map
for rank, done, content in Dataset._map_single(**dataset_kwargs):
File "C:\Users\matti\PycharmProjects\optimum-quanto\.venv\lib\site-packages\datasets\arrow_dataset.py", line 3552, in _map_single
batch = apply_function_on_filtered_inputs(
File "C:\Users\matti\PycharmProjects\optimum-quanto\.venv\lib\site-packages\datasets\arrow_dataset.py", line 3421, in apply_function_on_filtered_inputs
processed_inputs = function(*fn_args, *additional_args, **fn_kwargs)
File "C:\Users\matti\PycharmProjects\optimum-quanto\examples\speech\speech_recognition\quantize_asr_model.py", line 45, in transcribe_batch
predicted_ids = model.generate(features.to(model.device))
File "C:\Users\matti\PycharmProjects\optimum-quanto\.venv\lib\site-packages\transformers\models\whisper\generation_whisper.py", line 587, in generate
outputs = super().generate(
File "C:\Users\matti\PycharmProjects\optimum-quanto\.venv\lib\site-packages\torch\utils\_contextlib.py", line 115, in decorate_context
return func(*args, **kwargs)
File "C:\Users\matti\PycharmProjects\optimum-quanto\.venv\lib\site-packages\transformers\generation\utils.py", line 1914, in generate
result = self._sample(
File "C:\Users\matti\PycharmProjects\optimum-quanto\.venv\lib\site-packages\transformers\generation\utils.py", line 2666, in _sample
next_token_scores = logits_processor(input_ids, next_token_logits)
File "C:\Users\matti\PycharmProjects\optimum-quanto\.venv\lib\site-packages\transformers\generation\logits_process.py", line 98, in __call__
scores = processor(input_ids, scores)
File "C:\Users\matti\PycharmProjects\optimum-quanto\.venv\lib\site-packages\transformers\generation\logits_process.py", line 1796, in __call__
scores_processed = torch.where(suppress_token_mask, -float("inf"), scores)
File "C:\Users\matti\PycharmProjects\optimum-quanto\optimum\quanto\tensor\qtensor.py", line 93, in __torch_function__
return func(*args, **kwargs)
File "C:\Users\matti\PycharmProjects\optimum-quanto\optimum\quanto\tensor\qbytes.py", line 107, in __torch_dispatch__
return qdispatch(*args, **kwargs)
File "C:\Users\matti\PycharmProjects\optimum-quanto\optimum\quanto\tensor\qbytes_ops.py", line 314, in where
raise NotImplementedError
NotImplementedError
This issue is stale because it has been open 30 days with no activity. Remove stale label or comment or this will be closed in 5 days.
With pytorch 2.4 that error is solved but some tests now fails for a new error that looks like a problem during install. Do you have suggestions about how to install it on windows?
Hi all, I have encountered this issue while trying to work with models quantized to 8 bits, for instance, in #242 and many tests of this project. The error
RuntimeError: _int_mm_out_cuda not compiled for this platform.
just happens when calling torch._int_mm.@dacorvo mentioned in https://github.com/pytorch/pytorch/issues/130928 that it may be a problem with the GPU capability.
Versions
PyTorch version: 2.3.1+cu121 Is debug build: False CUDA used to build PyTorch: 12.1 ROCM used to build PyTorch: N/A
OS: Microsoft Windows 11 Home GCC version: Could not collect Clang version: Could not collect CMake version: Could not collect Libc version: N/A
Python version: 3.8.10 (tags/v3.8.10:3d8993a, May 3 2021, 11:48:03) [MSC v.1928 64 bit (AMD64)] (64-bit runtime) Python platform: Windows-10-10.0.22631-SP0 Is CUDA available: True CUDA runtime version: 12.1.66 CUDA_MODULE_LOADING set to: LAZY GPU models and configuration: GPU 0: NVIDIA GeForce RTX 3060 Nvidia driver version: 551.83 cuDNN version: Could not collect HIP runtime version: N/A MIOpen runtime version: N/A Is XNNPACK available: True
CPU: Architecture=9 CurrentClockSpeed=2100 DeviceID=CPU0 Family=198 L2CacheSize=12288 L2CacheSpeed= Manufacturer=GenuineIntel MaxClockSpeed=2100 Name=12th Gen Intel(R) Core(TM) i7-12700F ProcessorType=3 Revision=
Versions of relevant libraries: [pip3] mypy-extensions==1.0.0 [pip3] numpy==1.24.4 [pip3] torch==2.3.1+cu121 [pip3] torchaudio==2.3.1+cu121 [pip3] torchvision==0.18.1+cu121 [conda] Could not collect