[WASI-NN] Add single token inference

Hello, I am a code review bot on flows.network. Here are my reviews of code commits in this PR.

Overall summary:

This GitHub Pull Request titled "[WASI-NN] Add single token inference" introduces support for single token inference in the main.rs file and includes updates to the README.md file in the wasmedge-ggml-llama-interactive directory. The key changes in this patch include the addition of code to perform inference one token at a time and retrieve the output of each token, as well as the addition of a new flag try_single_token_inference to control this behavior.

In terms of the README.md update, a new section on Token Usage has been added. It provides instructions on using get_output() to retrieve the token usage of input and output text, and describes the format of the token usage data in JSON. Users are also advised to consider the context size and number of tokens used to avoid exceeding the limit.

There don't appear to be any potential problems or errors with this patch. It seems to be a straightforward implementation and documentation update.

Details

Commit 837af3a6155c8be1c06be6d7521c3035da201df0

Key changes:

Added support for single token inference in the main.rs file.
Added code to perform inference one token at a time and retrieve the output of each token.
Added a new flag try_single_token_inference to control whether to perform single token

Commit 05e07867b2263af942fbe1b3187c03c32d51502d

Key changes in the patch:

Updated the README.md file in the wasmedge-ggml-llama-interactive directory.
Added a section on Token Usage.
Provided instructions on using get_output() to retrieve the token usage of input and output text.
Described the format of the token usage data in JSON.
Advised users to be aware of the context size and number of tokens used to avoid exceeding the limit.

Potential problems:

There don't appear to be any potential problems with this patch. It seems to be a straightforward documentation update.

second-state / WasmEdge-WASINN-examples