Closed niranjanakella closed 10 months ago
Try something like this?
https://github.com/second-state/llama-utils/tree/main/simple
@hydai FYI, I have tried to re-instantiate as follows but still the model is able to store information.
loop {
let mut context = graph.init_execution_context().unwrap();
context
.set_input(
1,
wasi_nn::TensorType::U8,
&[1],
&options.to_string().as_bytes().to_vec(),
)
.unwrap();
println!("Question:-");
let input = read_input();
let model_input: String = format!("[INST] {} [/INST]", input.trim());
// Set prompt to the input tensor.
let tensor_data = model_input.as_bytes().to_vec();
context
.set_input(0, wasi_nn::TensorType::U8, &[1], &tensor_data)
.unwrap();
// Execute the inference.
println!("Answer:");
context.compute().unwrap();
// Retrieve the output.
let max_output_size = 4096 * 6;
let mut output_buffer = vec![0u8; max_output_size];
let mut output_size = context.get_output(0, &mut output_buffer).unwrap();
output_size = std::cmp::min(max_output_size, output_size);
let output = String::from_utf8_lossy(&output_buffer[..output_size]).to_string();
if !stream_stdout {
print!("{}", output.trim());
}
println!("");
}
@juntao Thank you for the quick response. Yes, I have tried this but this is just a one time inference. I wish to perform this in a loop. I did try re-instantiation but still the model is able to remember the previous conversation. @hydai
@hydai @juntao Is this possible any other way?
@hydai I have tried looking through the source code but there isn't a method for clearing/reseting the model state. Is there any other way?
Thank you for your dedicated support.
We can reproduce this according to the context design of the llama.cpp, it does store some information in the context. Unfortunately, we currently don't have any way to reset the context. We are going to have a new option to reset the context in our next release.
Okay thank's a lot @hydai. Appreciate the active feedback. Also any tentative timeline for the next release?
Hello @hydai any update on this release?
Hi @niranjanakella We've updated the plugin; the current and further versions will no longer keep any previous context states.
Please use the installer to re-install the WASI-NN ggml plugin. It should be updated to the latest one.
Feel free to re-open this issue if you get any related problems. Thanks.
@hydai I have trained a model (for summarization) and inferenced it after updating WASI-NN. But I still see the context being saved. Can you please take a look at the following code and let me know what is wrong.
loop {
let graph =
wasi_nn::GraphBuilder::new(wasi_nn::GraphEncoding::Ggml, wasi_nn::ExecutionTarget::AUTO)
.build_from_cache(model_name)
.unwrap();
let mut context = graph.init_execution_context().unwrap();
context
.set_input(
1,
wasi_nn::TensorType::U8,
&[1],
&options.to_string().as_bytes().to_vec(),
)
.unwrap();
let mut saved_prompt = String::new();
println!("\nDialogue:");
let input = read_input();
saved_prompt = format!("### Dialogue:{} ### Summary:", input.trim()); //TinyLLaMA
// Set prompt to the input tensor.
let tensor_data = saved_prompt.as_bytes().to_vec();
let start_time = Instant::now();
context
.set_input(0, wasi_nn::TensorType::U8, &[1], &tensor_data)
.unwrap();
context.compute().unwrap();
// Retrieve the output.
let max_output_size = 4096;
let mut output_buffer = vec![0u8; max_output_size];
let mut output_size = context.get_output(0, &mut output_buffer).unwrap();
output_size = std::cmp::min(max_output_size, output_size);
let output = String::from_utf8_lossy(&output_buffer[..output_size]).to_string();
let out = format!("\nSummary: {}\n\nResponse Time: {}", output.trim(), format_duration(start_time.elapsed()));
println!("{}", out);
}
}
Which version of the plugin do you use?
@hydai Never mind. There was an internal permissions issue which I fixed by chmod
it wasn't able to write to ._wasmedgec folder. It's working fine now.
Hello I wish to clear the models knowledge for every loop iteration without tracking previous inputs or outputs. Since the context object is a way to pass information between different parts of the model, and it is used to store the history of the conversation. This allows the model to keep track of what has been said previously and generate responses that are consistent with the conversation.
I don't want the model to keep a track of the previous conversation and hence wish to clear it. @hydai is there a way to do this?
Following is the loop that I am running:
At the end of each loop I wish to clear the model knowledge of the conversation.