Problems with interactive mode

I want to use an interactive mode and use this command

python generate.py --compile --interactive --draft_checkpoint_path checkpoints/$DRAFT_MODEL_REPO/model_int8.pth --checkpoint_path checkpoints/$MODEL_REPO/model_int8.pth --speculate_k 3

However, I got the following error, could you please help check about it?

Loading model ... Using int8 weight-only quantization! Using int8 weight-only quantization! Time to load model: 17.47 seconds Compilation time: 121.34 seconds What is your prompt? do you know meta? Traceback (most recent call last): File "/share/edc/home/xuandongzhao/safety/gpt-fast/generate.py", line 404, in main( File "/share/edc/home/xuandongzhao/safety/gpt-fast/generate.py", line 343, in main y, metrics = generate( File "/local/home/xuandongzhao/anaconda3/envs/safety/lib/python3.9/site-packages/torch/utils/_contextlib.py", line 115, in decorate_context return func(*args, **kwargs) File "/share/edc/home/xuandongzhao/safety/gpt-fast/generate.py", line 185, in generate callback(i) File "/share/edc/home/xuandongzhao/safety/gpt-fast/generate.py", line 326, in callback buffer.append(tokenizer.decode([period_id] + x.tolist())[1:]) TypeError: can only concatenate list (not "int") to list

pytorch-labs / gpt-fast

Problems with interactive mode #22