Thank you so much for the great work. When I'm trying to integrate the OPT model families the following error occurs. (I also tried LLaMA-2-7b, and was having the same issue.)
Traceback (most recent call last):
File "/home/ruisi/test.py", line 28, in
outputs = model.generate(inputs, max_new_tokens=128)
File "/home/ruisi/miniconda3/envs/awq/lib/python3.10/site-packages/torch/utils/_contextlib.py", line 115, in decorate_context
return func(*args, *kwargs)
File "/home/ruisi/miniconda3/envs/awq/lib/python3.10/site-packages/transformers/generation/utils.py", line 1538, in generate
and torch.sum(inputs_tensor[:, -1] == generation_config.pad_token_id) > 0
File "/home/ruisi/mx/mx_mapping.py", line 34, in wrapper
res = func(args, mx_specs=mx_specs, kwargs)
TypeError: simd_reduce_sum() missing 1 required positional argument: 'dim'
To reproduce the error, plz use the following code:
import torch
import torch.nn.functional as F
import numpy as np
import argparse
from transformers import AutoModelForCausalLM, AutoTokenizer, AutoConfig
from mx import finalize_mx_specs
from mx import mx_mapping
use_mx = True
if use_mx:
mx_specs = {'w_elem_format': 'fp6_e3m2', 'a_elem_format': 'fp6_e3m2',
'block_size': 32, 'bfloat': 16, 'custom_cuda': True,
'quantize_backprop': False,}
mx_specs = finalize_mx_specs(mx_specs)
mx_mapping.inject_pyt_ops(mx_specs)
tokenizer = AutoTokenizer.from_pretrained("facebook/opt-350m")
model = AutoModelForCausalLM.from_pretrained("facebook/opt-350m").to("cuda")
print(model)
example_input = "name three types of clouds"
inputs = tokenizer(example_input, padding=True, return_tensors="pt", truncation=True, max_length=100).to("cuda")
outputs = model.generate(**inputs, max_new_tokens=128)
print(tokenizer.decode(outputs[0], skip_special_tokens=True))
Traceback (most recent call last):
File "/home/ruisi/test.py", line 28, in
outputs = model.generate(inputs, max_new_tokens=128)
File "/home/ruisi/miniconda3/envs/awq/lib/python3.10/site-packages/torch/utils/_contextlib.py", line 115, in decorate_context
return func(*args, *kwargs)
File "/home/ruisi/miniconda3/envs/awq/lib/python3.10/site-packages/transformers/generation/utils.py", line 1538, in generate
and torch.sum(inputs_tensor[:, -1] == generation_config.pad_token_id) > 0
File "/home/ruisi/mx/mx_mapping.py", line 34, in wrapper
res = func(args, mx_specs=mx_specs, kwargs)
File "/home/ruisi/mx/simd_ops.py", line 514, in simd_reduce_sum
return SIMDReduceSum.apply(in1, dim, keepdim, mx_specs)
File "/home/ruisi/miniconda3/envs/awq/lib/python3.10/site-packages/torch/autograd/function.py", line 506, in apply
return super().apply(*args, **kwargs) # type: ignore[misc]
File "/home/ruisi/mx/simd_ops.py", line 395, in forward
in1 = vec_quantize(in1, mx_specs=mx_specs)
File "/home/ruisi/mx/vector_ops.py", line 39, in vec_quantize
return quantize_elemwise_op(input, mx_specs=mx_specs,
File "/home/ruisi/mx/elemwise_ops.py", line 253, in quantize_elemwise_op
A = _quantize_bfloat(A, bfloat=mx_specs['bfloat'], round=round,
File "/home/ruisi/mx/elemwise_ops.py", line 206, in _quantize_bfloat
return _quantize_elemwise_core(
File "/home/ruisi/mx/elemwise_ops.py", line 120, in _quantize_elemwise_core
A = custom_extensions.funcs.quantize_elemwise_func_cuda(
RuntimeError: expected scalar type Float but found Bool
Not sure if it's some package version issues; I'm using transformers 4.35.0, torch 2.0.1 and CUDA 11.4.
Hi there 👋,
Thank you so much for the great work. When I'm trying to integrate the OPT model families the following error occurs. (I also tried LLaMA-2-7b, and was having the same issue.)
To reproduce the error, plz use the following code:
I thought the error comes in this function: https://github.com/microsoft/microxcaling/blob/947417195c5dd44fe7787df92fd29549c54175e1/mx/simd_ops.py#L508 I tried to add some default values into
dim
, like settingdim=1
ordim=0
. But I got the following error:Not sure if it's some package version issues; I'm using
transformers 4.35.0
,torch 2.0.1
andCUDA 11.4
.Thank you in advance! 🙏