Closed dm4 closed 5 months ago
Hello, I am a code review bot on flows.network. Here are my reviews of code commits in this PR.
Overall summary:
This GitHub Pull Request titled "[WASI-NN] Add single token inference" introduces support for single token inference in the main.rs
file and includes updates to the README.md file in the wasmedge-ggml-llama-interactive
directory. The key changes in this patch include the addition of code to perform inference one token at a time and retrieve the output of each token, as well as the addition of a new flag try_single_token_inference
to control this behavior.
In terms of the README.md update, a new section on Token Usage has been added. It provides instructions on using get_output()
to retrieve the token usage of input and output text, and describes the format of the token usage data in JSON. Users are also advised to consider the context size and number of tokens used to avoid exceeding the limit.
There don't appear to be any potential problems or errors with this patch. It seems to be a straightforward implementation and documentation update.
Key changes:
main.rs
file.try_single_token_inference
to control whether to perform single tokenKey changes in the patch:
wasmedge-ggml-llama-interactive
directory.get_output()
to retrieve the token usage of input and output text.Potential problems:
This pull request is for adding the function of single token inference. To implement this function, we need to make modifications to 3 different repositories:
compute_single
andget_output_single
compute_single
andget_output_single