second-state / WasmEdge-WASINN-examples

Apache License 2.0
236 stars 39 forks source link

Unable to clear the context object knowledge #69

Closed niranjanakella closed 10 months ago

niranjanakella commented 1 year ago

Hello I wish to clear the models knowledge for every loop iteration without tracking previous inputs or outputs. Since the context object is a way to pass information between different parts of the model, and it is used to store the history of the conversation. This allows the model to keep track of what has been said previously and generate responses that are consistent with the conversation.

I don't want the model to keep a track of the previous conversation and hence wish to clear it. @hydai is there a way to do this?

Following is the loop that I am running:

loop {

        println!("Question:-");
        let input = read_input();
        let model_input: String = format!("[INST] {} [/INST]", input.trim());
        // Set prompt to the input tensor.
        let tensor_data = model_input.as_bytes().to_vec();
        context
            .set_input(0, wasi_nn::TensorType::U8, &[1], &tensor_data)
            .unwrap();

        // Execute the inference.
        println!("Answer:");
        context.compute().unwrap();

        // Retrieve the output.
        let max_output_size = 4096 * 6;
        let mut output_buffer = vec![0u8; max_output_size];
        let mut output_size = context.get_output(0, &mut output_buffer).unwrap();
        output_size = std::cmp::min(max_output_size, output_size);
        let output = String::from_utf8_lossy(&output_buffer[..output_size]).to_string();
        if !stream_stdout {
            print!("{}", output.trim());
        }
        println!("");

    }

At the end of each loop I wish to clear the model knowledge of the conversation.

juntao commented 1 year ago

Try something like this?

https://github.com/second-state/llama-utils/tree/main/simple

niranjanakella commented 1 year ago

@hydai FYI, I have tried to re-instantiate as follows but still the model is able to store information.

loop {
        let mut context = graph.init_execution_context().unwrap();

        context
        .set_input(
            1,
            wasi_nn::TensorType::U8,
            &[1],
            &options.to_string().as_bytes().to_vec(),
        )
        .unwrap();

        println!("Question:-");
        let input = read_input();
        let model_input: String = format!("[INST] {} [/INST]", input.trim());
        // Set prompt to the input tensor.
        let tensor_data = model_input.as_bytes().to_vec();
        context
            .set_input(0, wasi_nn::TensorType::U8, &[1], &tensor_data)
            .unwrap();

        // Execute the inference.
        println!("Answer:");
        context.compute().unwrap();

        // Retrieve the output.
        let max_output_size = 4096 * 6;
        let mut output_buffer = vec![0u8; max_output_size];
        let mut output_size = context.get_output(0, &mut output_buffer).unwrap();
        output_size = std::cmp::min(max_output_size, output_size);
        let output = String::from_utf8_lossy(&output_buffer[..output_size]).to_string();
        if !stream_stdout {
            print!("{}", output.trim());
        }
        println!("");

    }
Screenshot 2023-11-19 at 6 14 17 PM
niranjanakella commented 1 year ago

@juntao Thank you for the quick response. Yes, I have tried this but this is just a one time inference. I wish to perform this in a loop. I did try re-instantiation but still the model is able to remember the previous conversation. @hydai

niranjanakella commented 1 year ago

@hydai @juntao Is this possible any other way?

niranjanakella commented 1 year ago

@hydai I have tried looking through the source code but there isn't a method for clearing/reseting the model state. Is there any other way?

Thank you for your dedicated support.

hydai commented 12 months ago

We can reproduce this according to the context design of the llama.cpp, it does store some information in the context. Unfortunately, we currently don't have any way to reset the context. We are going to have a new option to reset the context in our next release.

niranjanakella commented 12 months ago

Okay thank's a lot @hydai. Appreciate the active feedback. Also any tentative timeline for the next release?

niranjanakella commented 11 months ago

Hello @hydai any update on this release?

hydai commented 10 months ago

Hi @niranjanakella We've updated the plugin; the current and further versions will no longer keep any previous context states.

Please use the installer to re-install the WASI-NN ggml plugin. It should be updated to the latest one.

Feel free to re-open this issue if you get any related problems. Thanks.

niranjanakella commented 9 months ago

@hydai I have trained a model (for summarization) and inferenced it after updating WASI-NN. But I still see the context being saved. Can you please take a look at the following code and let me know what is wrong.

loop {
        let graph =
    wasi_nn::GraphBuilder::new(wasi_nn::GraphEncoding::Ggml, wasi_nn::ExecutionTarget::AUTO)
        .build_from_cache(model_name)
        .unwrap();

    let mut context = graph.init_execution_context().unwrap();

    context
    .set_input(
        1,
        wasi_nn::TensorType::U8,
        &[1],
        &options.to_string().as_bytes().to_vec(),
    )
    .unwrap();

    let mut saved_prompt = String::new();

    println!("\nDialogue:");
    let input = read_input();
    saved_prompt = format!("### Dialogue:{} ### Summary:", input.trim()); //TinyLLaMA

    // Set prompt to the input tensor.
    let tensor_data = saved_prompt.as_bytes().to_vec();

    let start_time = Instant::now();
    context
        .set_input(0, wasi_nn::TensorType::U8, &[1], &tensor_data)
        .unwrap();

    context.compute().unwrap();

    // Retrieve the output.
    let max_output_size = 4096;
    let mut output_buffer = vec![0u8; max_output_size];
    let mut output_size = context.get_output(0, &mut output_buffer).unwrap();
    output_size = std::cmp::min(max_output_size, output_size);
    let output = String::from_utf8_lossy(&output_buffer[..output_size]).to_string();

    let out = format!("\nSummary: {}\n\nResponse Time: {}", output.trim(), format_duration(start_time.elapsed()));
    println!("{}", out);

        }
    }
hydai commented 9 months ago

Which version of the plugin do you use?

niranjanakella commented 9 months ago

@hydai Never mind. There was an internal permissions issue which I fixed by chmod it wasn't able to write to ._wasmedgec folder. It's working fine now.