lm-sys / FastChat

An open platform for training, serving, and evaluating large language models. Release repo for Vicuna and Chatbot Arena.
Apache License 2.0
36.63k stars 4.52k forks source link

FastChat-T5 4K context #1711

Open tutankhamen-1 opened 1 year ago

tutankhamen-1 commented 1 year ago

lmsys.org states that FastChat-T5 supports a context size of 4K. How do I get it to work? I get an error as soon as I go above 2K.

DachengLi1 commented 1 year ago

It can encode 2K tokens, and output 2K tokens, a total of 4K tokens. But it cannot take in 4K tokens along. @tutankhamen-1. In contrast, Llama-like model encode+output 2K tokens.

tutankhamen-1 commented 1 year ago

It can encode 2K tokens, and output 2K tokens, a total of 4K tokens. But it cannot take in 4K tokens along. @tutankhamen-1. In contrast, Llama-like model encode+output 2K tokens.

That’s great, but the 2K total limit seems to be hardcoded in many places and I can’t get it to work. I’m trying to use it through the API.

DachengLi1 commented 1 year ago

The current behavior should be correct? It can only encode 2K tokens, which is the hardcoded places you see. But it can output another 2K tokens. If you use Llama (vicuna), it can encode 2K tokens, but when you give 2K tokens to it, it cannot output anything.

tutankhamen-1 commented 1 year ago

This is the error message I get:

This model's maximum context length is 2048 tokens. However, you requested 2302 tokens (1790 in the messages, 512 in the completion). Please reduce the length of the messages or completion.

Model: fastchat-t5-3b-v1.0

DachengLi1 commented 1 year ago

@tutankhamen-1 Thanks for letting us know! We will fix it. @merrymercy Let's change the error message for T5?

Taytay commented 1 year ago

Isn't this limit arbitrary for t5 due to its attention mechanism? My understanding is that it uses quadratic memory as the context length goes up, but as long as you have the ram to support it, a longer context length isn't limited by the model itself.

merrymercy commented 1 year ago

@tutankhamen-1 Could you help us fix the bug and contribute a pull request?