Suite of `outlines.processors` for Sampling Techniques and Debug Logging

lapp0 commented 1 month ago

Presentation of the new feature

Logits processors in outlines.processors support nearly every inference engine, offering a "write once, run anywhere" implementation of business logic.

Currently, vLLM, TGI, LoRAX, and xinference depend on Outlines for structured generation. Inference libraries can also utilize Outlines for various sampling techniques. To solidify Outlines' role in the LLM ecosystem, we should focus on developing robust logits processors that can be shared across multiple libraries.

A good case study is the new MLXLM library, which doesn't support structured generation and only supports a handful of sampling techniques

We can help downstream libraries by removing the potential complexity and bugs of structured generation.

A good starting point would be implementing:

LogitsProcessors for sampling augmentation:

FrequencyPenaltyLogitsProcessor,
MinPLogitsProcessor,
NoRepeatNGramLogitsProcessor,
PresencePenaltyLogitsProcessor,
QuadraticSmoothingLogitsProcessor,
RepetitionPenaltyLogitsProcessor,
TemperatureLogitsProcessor,
TFSLogitsProcessor,
TopKLogitsProcessor,
TopPLogitsProcessor,

Logits Processors for logging:

LogitsLoggingLogitsProcessor,
SequenceLoggingLogitsProcessor

Where does it fit in Outlines?

We can expand our existing strong foundation of "write once, run anywhere" outlines.processors

Are you willing to open a PR?

Yes, I have some progress as well https://github.com/lapp0/outlines/pull/35

rlouf commented 1 month ago

The idea is interesting. Do we really want to increase the surface area of the library? There's a risk of spreading ourselves too thinly by adding extra code to maintain.

lapp0 commented 1 month ago

Yes that's a risk but IMO it wouldn't substantially increase the maintenance requirements of the library:

Stateless sampling augmentation processors are quite a bit simpler than stateful structured generation processors. They are independent from the rest of the library, and from one another.
If multiple libraries depend on these processors, theoretically there should be more eyes on the processors and more new contributors maintaining and expanding.

Supporting your argument, there would be more PRs to review even if a downstream maintainer contributes, and for new processors there may be edge cases which require priority fixes.

outlines-dev / outlines