[Text Generation] Avoid mutating the original logits in-place when mapping from logits to token.

Feature Description

TokenGenerator.generate(logits: numpy.ndarray) takes a logits array and, if do_sample=True, optionally mutates them to enforce the appropriate sampling strategy.

This results in the correct generation of tokens, but an in-place modification of the logits array. Logits array is then returned to the user in the mutated form. This can be confusing for the users who are interested in logits value, specially when returned together with prompt logits:

First column before: the generated logits returned to the user are mutated (the distribution is spikier, arguably close to Dirac distribution)

Second column after: the generated logits are the non-mutatated, original logits, consistent with the prompt logits.

neuralmagic / deepsparse

[Text Generation] Avoid mutating the original logits in-place when mapping from logits to token. #1414

Feature Description