Inference Error with OPT models

Hi there 👋,

Thank you so much for the great work. When I'm trying to integrate the OPT model families the following error occurs. (I also tried LLaMA-2-7b, and was having the same issue.)

Traceback (most recent call last): File "/home/ruisi/test.py", line 28, in outputs = model.generate(inputs, max_new_tokens=128) File "/home/ruisi/miniconda3/envs/awq/lib/python3.10/site-packages/torch/utils/_contextlib.py", line 115, in decorate_context return func(*args, *kwargs) File "/home/ruisi/miniconda3/envs/awq/lib/python3.10/site-packages/transformers/generation/utils.py", line 1538, in generate and torch.sum(inputs_tensor[:, -1] == generation_config.pad_token_id) > 0 File "/home/ruisi/mx/mx_mapping.py", line 34, in wrapper res = func(args, mx_specs=mx_specs, kwargs) TypeError: simd_reduce_sum() missing 1 required positional argument: 'dim'

To reproduce the error, plz use the following code:

import torch
import torch.nn.functional as F
import numpy as np
import argparse

from transformers import AutoModelForCausalLM, AutoTokenizer, AutoConfig

from mx import finalize_mx_specs
from mx import mx_mapping

use_mx = True

if use_mx:
    mx_specs = {'w_elem_format': 'fp6_e3m2', 'a_elem_format': 'fp6_e3m2',
            'block_size': 32, 'bfloat': 16, 'custom_cuda': True,
            'quantize_backprop': False,}
    mx_specs = finalize_mx_specs(mx_specs)
    mx_mapping.inject_pyt_ops(mx_specs)

tokenizer = AutoTokenizer.from_pretrained("facebook/opt-350m")
model = AutoModelForCausalLM.from_pretrained("facebook/opt-350m").to("cuda")

print(model)

example_input = "name three types of clouds"
inputs = tokenizer(example_input, padding=True, return_tensors="pt", truncation=True, max_length=100).to("cuda")

outputs = model.generate(**inputs, max_new_tokens=128)
print(tokenizer.decode(outputs[0], skip_special_tokens=True))

I thought the error comes in this function: https://github.com/microsoft/microxcaling/blob/947417195c5dd44fe7787df92fd29549c54175e1/mx/simd_ops.py#L508 I tried to add some default values into dim, like setting dim=1 or dim=0. But I got the following error:

Traceback (most recent call last): File "/home/ruisi/test.py", line 28, in outputs = model.generate(inputs, max_new_tokens=128) File "/home/ruisi/miniconda3/envs/awq/lib/python3.10/site-packages/torch/utils/_contextlib.py", line 115, in decorate_context return func(*args, *kwargs) File "/home/ruisi/miniconda3/envs/awq/lib/python3.10/site-packages/transformers/generation/utils.py", line 1538, in generate and torch.sum(inputs_tensor[:, -1] == generation_config.pad_token_id) > 0 File "/home/ruisi/mx/mx_mapping.py", line 34, in wrapper res = func(args, mx_specs=mx_specs, kwargs) File "/home/ruisi/mx/simd_ops.py", line 514, in simd_reduce_sum return SIMDReduceSum.apply(in1, dim, keepdim, mx_specs) File "/home/ruisi/miniconda3/envs/awq/lib/python3.10/site-packages/torch/autograd/function.py", line 506, in apply return super().apply(*args, **kwargs) # type: ignore[misc] File "/home/ruisi/mx/simd_ops.py", line 395, in forward in1 = vec_quantize(in1, mx_specs=mx_specs) File "/home/ruisi/mx/vector_ops.py", line 39, in vec_quantize return quantize_elemwise_op(input, mx_specs=mx_specs, File "/home/ruisi/mx/elemwise_ops.py", line 253, in quantize_elemwise_op A = _quantize_bfloat(A, bfloat=mx_specs['bfloat'], round=round, File "/home/ruisi/mx/elemwise_ops.py", line 206, in _quantize_bfloat return _quantize_elemwise_core( File "/home/ruisi/mx/elemwise_ops.py", line 120, in _quantize_elemwise_core A = custom_extensions.funcs.quantize_elemwise_func_cuda( RuntimeError: expected scalar type Float but found Bool

Not sure if it's some package version issues; I'm using transformers 4.35.0, torch 2.0.1 and CUDA 11.4.

Thank you in advance! 🙏

microsoft / microxcaling

Inference Error with OPT models #18