s-kostyaev / ellama

Ellama is a tool for interacting with large language models from Emacs.
GNU General Public License v3.0
475 stars 30 forks source link

[Q] How do I use a base (completion only, no instruct/chat) model? #108

Open NightMachinery opened 5 months ago

NightMachinery commented 5 months ago

I have set:

(setopt ellama-provider
    (make-llm-ollama
     :chat-model "deepseek-coder:6.7b-base-q8_0"
     :embedding-model "deepseek-coder:6.7b-base-q8_0"
         ))

But ellama seems to be sending some kind of template-y message to ollama when I do ellama-code-complete. I don't want any "prompt engineering", I just want to feed the context near point into this base model and get its next N lines of prediction.

NightMachinery commented 5 months ago

Indeed, looking at the logs, ellama is using a template:

[2024-04-15 00:13:02] [Emacs --> deepseek-coder:6.7b-base-q8_0]:
Interactions:
User: Continue the following code, only write new code in format ```language
...
```:

!/usr/bin/env python3

prints hello world

print('

NightMachinery commented 5 months ago

ellama seems to be parsing the response as well and doing magic on it. (E.g., the response gets suddenly deleted when it finishes streaming, presumably because it wasn't in backticks.) I want to disable all magic.

s-kostyaev commented 5 months ago

You can use ellama-complete command for that purpose. I don't think it will be useful tough.

NightMachinery commented 5 months ago

@s-kostyaev Thanks. Is there a way to limit the completion so that it stops on a newline?

s-kostyaev commented 5 months ago

@NightMachinery Sure. You need to create custom model with ollama. Add parameter:

PARAMETER stop "\n"

And create custom model from this modelfile. And use this new created model. For example, I use https://ollama.com/sskostyaev/openchat:1l to create chat names.

NightMachinery commented 5 months ago

@s-kostyaev Looking at the logs, ellama-complete still uses the chat API and we can see User: at the start of its request. I want to directly use the completion API:

res = openrouter_client.completions.create(
    model="mistralai/mixtral-8x22b",
    prompt="""...""",
    stream=True,
    echo=False, #: Echo back the prompt in addition to the completion
    max_tokens=100,
)

This completion API works pretty good for completing text in my tests. This API with a reasonable max_tokens can be a viable alternative for Copilot IMO.