Add compatibility with new sampling algorithms in llama.cpp

Title: Add compatibility with new sampling algorithms in llama.cpp

Description: This pull request addresses issue https://github.com/nomic-ai/gpt4all-chat/issues/200#issue-1689677866 by adding compatibility with new sampling algorithms in llama.cpp.

Changes:

Implemented temperature sampling with repetition penalty as an alternative to the previous llama_sample_top_p_top_k sampling method.

        // Temperature sampling with repetition_penalty
        llama_sample_repetition_penalty(
            d_ptr->ctx, &candidates_data,
            promptCtx.tokens.data() + promptCtx.n_ctx - promptCtx.repeat_last_n, promptCtx.repeat_last_n,
            promptCtx.repeat_penalty);
        llama_sample_top_k(d_ptr->ctx, &candidates_data, promptCtx.top_k);
        llama_sample_top_p(d_ptr->ctx, &candidates_data, promptCtx.top_p);
        llama_sample_temperature(d_ptr->ctx, &candidates_data, promptCtx.temp);
        llama_token id = llama_sample_token(d_ptr->ctx, &candidates_data);

nomic-ai / gpt4all-chat

Add compatibility with new sampling algorithms in llama.cpp #219