Batch llama prompt - Githubissues

This PR is discussed in #2108 and handles mask creation for the Llama model that allows for processing a user supplied prompt in token batches instead of all at once. The key change was to Cache::mask(), adding a second usize and then creating the appropriately sized vector to turn into a Tensor there.

The code in candle-examples/examples/llama/main.rs in this PR may need smoothing, but other than that, I've tested the example with and without the new --prompt-batch-size CLI parameter and at a variety of sizes.

huggingface / candle

Batch llama prompt #2111