Open enn-nafnlaus opened 1 month ago
Hey @enn-nafnlaus
Do you use your custom model as autocomplete or in the chat? For the latter, configure your Ollama dev model as follows:
"cody.dev.models": [
{
"provider": "ollama",
"model": "your_model",
"inputTokens": 65000,
<additional_configs like apiEndpoint etc.>
}
]
Here the inputTokens
is important to set correctly.
Thanks for the response, and sorry for the delay (had to deal with taxes). :) Your solution fixed context length (after a restart), but it's still pure hallucination. E.g.:
Just to demonstrate that ollama and the model works locally:
(I'd also add that it's frustrating that ollama is the only available local server, given that it's not like APIs between different local servers are radically different. Ollama gives you so little control over how the model is run vs, say, llama.cpp. Like, I can'control cross-card memoroyallocation (it's not even spanning multiple GPUs), batches, run it with speculative decoding, and on and on.)
I will add that I experience the hallucination issue with this family of models as well. However, using other local models, I have no such issue.
deepseek-coder-v2:lite:
deepseek-coder:6.7b, I killed the hallucination - it will run on indefinitely:
qwen2.5-coder:7b works just fine:
Seems like some models only like being code completion, and others are better for analyzing code. I haven't found a model to run locally that can serve as both.
I am a layperson, but maybe this will also be useful info...
If I run ollama serve
(as opposed to running it in the background), I notice this output appears for the model in question:
check_double_bos_eos: Added a BOS token to the prompt as specified by the model but the prompt also starts with a BOS token. So now the final prompt starts with 2 BOS tokens. Are you sure this is what you want?
[GIN] 2024/10/12 - 05:51:01 | 200 | 547.416542ms | 127.0.0.1 | POST "/api/generate"
check_double_bos_eos: Added a BOS token to the prompt as specified by the model but the prompt also starts with a BOS token. So now the final prompt starts with 2 BOS tokens. Are you sure this is what you want?
[GIN] 2024/10/12 - 05:51:01 | 200 | 568.794375ms | 127.0.0.1 | POST "/api/generate"
check_double_bos_eos: Added a BOS token to the prompt as specified by the model but the prompt also starts with a BOS token. So now the final prompt starts with 2 BOS tokens. Are you sure this is what you want?
[GIN] 2024/10/12 - 05:51:01 | 200 | 448.858958ms | 127.0.0.1 | POST "/api/generate"
I see no such thing for other models that behave well.
Version
v1.34.3
Describe the bug
Cody seems to be working fine normally. However, I recently started ollama to run local models, in particular, deepseek coder v2 lite (q4). However, it seems like the context window is ridiculously short,and the output is pure hallucination.
1) It only lets me include the first 172 lines of one file. A word counter says it's 597 words long. Eve accounting for some long "words" (like a couple regexstatements), this is clearly no more than a couple thousand tokens. Deepseek Coder v2 Lite supports 128k tokens context. Are you not setting the context size in the API, and just using the default?
2) Unlike with e.g. Claude, what I get back has absolutely nothing to do with the provided context. Presented with the first 100 lines of a Python mapping program and an instruction "Add debug", it writes:
`deepseek coder v2:latest <|begin▁of▁sentence|># 1. 题目 Given a binary tree, find its minimum depth. The minimum depth is the number of nodes along the shortest path from the root node down to the nearest leaf node. Note: A leaf is a node with no children.
Example:
Given binary tree [3,9,20,null,null,15,7], 3 / \ 9 20 / \ 15 7 return its minimum depth = 2.
Copy
Apply
2.2 代码实现
Definition for a binary tree node.
class TreeNode: def init(self, val=0, left=None, right=None): self.val = val self.left = left self.right = right
def minDepth(root: TreeNode) -> int: if not root: return 0
It should go without saying that this has absolutely nothing to do with the code (which is not about binary trees and has nothing to do with the Chinese language), nor anything to do with the instruction to add debug.
Expected behavior
It should allow use of the full 128k context window and the outputs should at least remotely pertain to the task.
Additional context
No response