Closed codinglover0111 closed 9 months ago
The problem is caused by beam search algorithm in fairseq package. The error comes from here:
https://github.com/microsoft/torchscale/blob/main/examples/fairseq/models/retnet.py#L256
To reorder the intermediate beam search result, the model selects the prev_scale
in a wrong way.
In our experiments, we don't use beam search. Instead, nucleus sampling is sufficient for LLMs.
So how can I fix this?
If you want to use fairseq_cli.interactive, we can just modify reorder_incremental_state_scripting
fuction above. When reordering the incremental_state
, ignore incremental_state["scale"]
.