I am following the instructions here to run a simple inferencing with Llama model.
The printed response from client.inference.chat_completion gave the a lot of other information besides the output content, as the following:
ChatCompletionResponse(completion_message=CompletionMessage(content='Here is a 2 sentence poem about the moon:\n\nThe moon glows bright in the midnight sky,\nA silver beacon, passing us by.', role='assistant', stop_reason='end_of_turn', tool_calls=[]), logprobs=None)
How can I just generate the content with the desired format? Basically, filtering out the content from all the response output.
I am following the instructions here to run a simple inferencing with Llama model.
The printed
response
fromclient.inference.chat_completion
gave the a lot of other information besides the output content, as the following:How can I just generate the content with the desired format? Basically, filtering out the content from all the response output.