Usage example for `get_output()`

natke commented 3 weeks ago

Discussed in https://github.com/microsoft/onnxruntime-genai/discussions/522

^{Originally posted by **anutkk** May 26, 2024} According to the [documentation ](https://onnxruntime.ai/docs/genai/api/python.html#get-output) `generator.get_output()` should return the generated logits. In practice, this is the error message I get: ``` --------------------------------------------------------------------------- TypeError Traceback (most recent call last) Cell In[37], [line 1](vscode-notebook-cell:?execution_count=37&line=1) ----> [1](vscode-notebook-cell:?execution_count=37&line=1) generator.get_output() TypeError: get_output(): incompatible function arguments. The following argument types are supported: 1. (self: onnxruntime_genai.onnxruntime_genai.Generator, arg0: str) -> numpy.ndarray Invoked with: ``` The function expects an input string. However no matter what I put, the output is `array([], dtype=float64)`. What is the correct way to use this method?

natke commented 3 weeks ago

So after digging through the C++ source code, the answer is:

logits=generator.get_output("logits") However for some reason at the first step the maximum token is different from the output of get_next_tokens(). Not sure if this is a bug or a misunderstanding.

import onnxruntime_genai as og import numpy as np

prompt = '''<|user|> Please tell me the time.<|end|> <|assistant|>'''

model=og.Model("/home/ubuntu/models/Phi-3-mini-4k-instruct-onnx/cuda/cuda-fp16/")

tokenizer = og.Tokenizer(model)

tokens = tokenizer.encode(prompt)

params=og.GeneratorParams(model) params.input_ids = tokens

generator = og.Generator(model, params) i = 0 while not generator.is_done(): generator.compute_logits() generator.generate_next_token()
new_token = generator.get_next_tokens()[0] logits = generator.get_output("logits").squeeze() new_token2 = np.argmax(logits) print(new_token, " ", new_token2) i += 1 if i > 10: break

print() And the result:

306 18 29915 29915 29885 29885 9368 9368 304 304 3867 3867 1855 1855 29899 29899 2230 2230 848 848 29892 29892

yufenglee commented 3 weeks ago

It is expected. The get_output(output_name) returns the output tensor with output_name. The prompt case returns the logits for the whole prompt with shape (batch, prompt_length, model_hidden_size)

natke commented 2 weeks ago

Hi @anutkk

We added some more documentation for this API here: https://onnxruntime.ai/docs/genai/api/python.html#get-output

Please let us know if that helps you resolve your issue.

anutkk commented 2 weeks ago

Thanks @natke and @yufenglee If so how can I capture the logit of the first generated token? The matrix returned for the "prompt" (i.e. before calling compute_logits() I guess?) is zeroes. I'm not entirely sure https://github.com/microsoft/onnxruntime-genai/pull/611 deals with this issue.

yufenglee commented 1 week ago

Thanks @natke and @yufenglee If so how can I capture the logit of the first generated token? The matrix returned for the "prompt" (i.e. before calling compute_logits() I guess?) is zeroes. I'm not entirely sure #611 deals with this issue.

@anutkk, to get the logits you can do like this:

    generator.compute_logits()
    logits = generator.get_output('logits')
    assert np.allclose(logits[:,:,::200], expected_sampled_logits_prompt, atol=1e-3)
    generator.generate_next_token()

, like it is shown in the test file: https://github.com/microsoft/onnxruntime-genai/blob/c622cc11622dfb88b8b43f1e72347119b8728a25/test/python/test_onnxruntime_genai_api.py#L208-L211

natke commented 1 week ago

@anutkk Basically the first call to get_output("logits") will give you the logits of the prompt, and the second call will give you the logits of the first generated token. Let us know if that solves your issue!

microsoft / onnxruntime-genai

Usage example for `get_output()` #591

Discussed in https://github.com/microsoft/onnxruntime-genai/discussions/522