nomic-ai / gpt4all-chat

gpt4all-j chat
Other
1.27k stars 155 forks source link

Add compatibility with new sampling algorithms in llama.cpp #219

Closed kuvaus closed 1 year ago

kuvaus commented 1 year ago

Title: Add compatibility with new sampling algorithms in llama.cpp

Description: This pull request addresses issue https://github.com/nomic-ai/gpt4all-chat/issues/200#issue-1689677866 by adding compatibility with new sampling algorithms in llama.cpp.

Changes:

Implemented temperature sampling with repetition penalty as an alternative to the previous llama_sample_top_p_top_k sampling method.

        // Temperature sampling with repetition_penalty
        llama_sample_repetition_penalty(
            d_ptr->ctx, &candidates_data,
            promptCtx.tokens.data() + promptCtx.n_ctx - promptCtx.repeat_last_n, promptCtx.repeat_last_n,
            promptCtx.repeat_penalty);
        llama_sample_top_k(d_ptr->ctx, &candidates_data, promptCtx.top_k);
        llama_sample_top_p(d_ptr->ctx, &candidates_data, promptCtx.top_p);
        llama_sample_temperature(d_ptr->ctx, &candidates_data, promptCtx.temp);
        llama_token id = llama_sample_token(d_ptr->ctx, &candidates_data);
manyoso commented 1 year ago

I will look at this, but will need to update the submodule at the same time otherwise this will break. But this helps a ton! Thanks @kuvaus !