Open NightMachinery opened 5 months ago
Indeed, looking at the logs, ellama is using a template:
[2024-04-15 00:13:02] [Emacs --> deepseek-coder:6.7b-base-q8_0]:
Interactions:
User: Continue the following code, only write new code in format ```language
...
```:
print('
ellama seems to be parsing the response as well and doing magic on it. (E.g., the response gets suddenly deleted when it finishes streaming, presumably because it wasn't in backticks.) I want to disable all magic.
You can use ellama-complete
command for that purpose. I don't think it will be useful tough.
@s-kostyaev Thanks. Is there a way to limit the completion so that it stops on a newline?
@NightMachinery Sure. You need to create custom model with ollama. Add parameter:
PARAMETER stop "\n"
And create custom model from this modelfile. And use this new created model. For example, I use https://ollama.com/sskostyaev/openchat:1l to create chat names.
@s-kostyaev Looking at the logs, ellama-complete
still uses the chat API and we can see User:
at the start of its request. I want to directly use the completion API:
res = openrouter_client.completions.create(
model="mistralai/mixtral-8x22b",
prompt="""...""",
stream=True,
echo=False, #: Echo back the prompt in addition to the completion
max_tokens=100,
)
This completion API works pretty good for completing text in my tests. This API with a reasonable max_tokens
can be a viable alternative for Copilot IMO.
I have set:
But ellama seems to be sending some kind of template-y message to ollama when I do
ellama-code-complete
. I don't want any "prompt engineering", I just want to feed the context near point into this base model and get its next N lines of prediction.