Question about COMPLETION_MAX_PROMPT

hyperonym / basaran

Basaran is an open-source alternative to the OpenAI text completion API. It provides a compatible streaming API for your Hugging Face Transformers-based text generation models.

MIT License

1.29k stars 81 forks source link

Question about COMPLETION_MAX_PROMPT #158

Closed nicpopovic closed 1 year ago

nicpopovic commented 1 year ago

Hi, I noticed that COMPLETION_MAX_PROMPT is defined via the length of the prompt in characters, rather than tokens, and was wondering if this is intended? If intentional, it may be worth clarifying somewhere (as currently the default value is identical to that of COMPLETION_MAX_TOKENS) and/or adding a warning when a prompt is being truncated.

peakji commented 1 year ago

COMPLETION_MAX_PROMPT is just a safety measure to handle worst-case scenarios. Since COMPLETION_MAX_TOKENS can only ensure that the number of newly generated tokens is controllable, another mechanism is needed to prevent the input side from being maliciously exploited.

We recommend setting COMPLETION_MAX_PROMPT to a large value, even up to the maximum length that a model can effectively handle. That's also why we think the need for a warning is not very significant.

nicpopovic commented 1 year ago

Makes sense, though I would suggest reconsidering adding a warning: With the default value of 4096, an example prompt I was using was being truncated to approx. 700 tokens, which is not noticeable if you are not echoing prompt tokens. The only way to notice, of course, is that suddenly the generated output is worse than expected.

One quick way of adding some level of transparency could be to add a character count to the playground input field, although this does not solve this issue for the API.

peakji commented 1 year ago

Perhaps a workaround is to increase the default to a large value (e.g. 16384?), which can avoid surprising users.

Warning or throwing errors may break compatibility with the OpenAI API, as their documentation does not provide detailed examples for corner cases. We have to manually try them out, which can be literally very expensive.

Choosing to set a larger default value also aligns with the overall design of other environment variables, such as allowing CORS by default and listening on 0.0.0.0. This allows users who care about being too permissive to set stricter options, while ensuring that most people can use the tool out of the box.

peakji commented 1 year ago

Also, we will thoroughly revamp the playground interface soon. The current implementation based on EventSource is also limiting the prompt length due to URL encoding. We will switch to using POST requests in the future, which is particularly important for adding the chat interface.

nicpopovic commented 1 year ago

Sounds good, thanks for all your work on this project, I'm really liking it :)