how to use model.generate with smoothquant models

I did

import torch
from transformers import GPT2Tokenizer
from smoothquant.opt import Int8OPTForCausalLM

tokenizer = GPT2Tokenizer.from_pretrained('facebook/opt-6.7b')
model_smoothquant = Int8OPTForCausalLM.from_pretrained('mit-han-lab/opt-6.7b-smoothquant', torch_dtype=torch.float16, device_map='auto').to('cuda')

text = "The quick brown fox"
input_ids = tokenizer(text, return_tensors="pt", padding=True, truncation=True, max_length=512).input_ids.to('cuda')

generated_ids = model_smoothquant.generate(input_ids, max_length=32)

but got

ValueError: The provided attention mask has length 21, but its length should be 32 (sum of the lengths of current and past inputs)

Does anyone know how to correctly use generator of smoothquant models?

mit-han-lab / smoothquant

how to use model.generate with smoothquant models #82