Inference Explainability/Suppression (AtMan)

Feature request

Flags for inference enrich the output with explainability information or suppress specific input token/embedding spaces, as described here.

Motivation

I'm not sure if this is out of scope of TGI, but I figured I might bring it up at least.

Explainability and truthfulness is a very important topic for us and our users, and after watchin GTC and reading this paper (github), I believe there's potential here to significant work here that could be done.

The explainability and suppression methods proposed in that paper would be extremely useful for us, though I can understand if that is too niche.

Your contribution

I'd be willing to put in groundwork if that helps, though I also have to admit that I'm out of my depth at that level of manipulating attention, so I would definitely need some guidance if I were to contribute.

huggingface / text-generation-inference