Introduce guardrails around long prompts

It is currently easy to cause an out-of-memory condition when prompting a model with a very long prompt. This is an expected result of the implementation of both certain tokenizers as well as transformer attention. Experienced users may intentionally want to use long prompts, but less experienced users may encounter this issue by accident and encounter confusing OOM conditions (#31) or extremely slow runtime performance.

It may be helpful to explore a mechanism to limit default prompt length in order to help users avoid these friction points.

jncraton / languagemodels

Introduce guardrails around long prompts #32