neuralmagic / deepsparse

Sparsity-aware deep learning inference runtime for CPUs
https://neuralmagic.com/deepsparse/
Other
2.97k stars 171 forks source link

[Text Generation] Avoid mutating the original logits in-place when mapping from logits to token. #1414

Closed dbogunowicz closed 9 months ago

dbogunowicz commented 10 months ago

Feature Description

TokenGenerator.generate(logits: numpy.ndarray) takes a logits array and, if do_sample=True, optionally mutates them to enforce the appropriate sampling strategy.

This results in the correct generation of tokens, but an in-place modification of the logits array. Logits array is then returned to the user in the mutated form. This can be confusing for the users who are interested in logits value, specially when returned together with prompt logits:

image

First column before: the generated logits returned to the user are mutated (the distribution is spikier, arguably close to Dirac distribution)

Second column after: the generated logits are the non-mutatated, original logits, consistent with the prompt logits.