longy2k / obsidian-bmo-chatbot

Generate and brainstorm ideas while creating your notes using Large Language Models (LLMs) from Ollama, LM Studio, Anthropic, Google Gemini, Mistral AI, OpenAI, and more for Obsidian.
https://ko-fi.com/longy2k
MIT License
344 stars 43 forks source link

Ollama support #15

Closed twalderman closed 7 months ago

twalderman commented 10 months ago

Ollama serve is working, I can see post but nothing in the BMO response after prompt. System Prompt doesnt make a difference.

llama_print_timings: load time = 5719.89 ms llama_print_timings: sample time = 0.00 ms / 1 runs ( 0.00 ms per token, inf tokens per second) llama_print_timings: prompt eval time = 576.64 ms / 111 tokens ( 5.19 ms per token, 192.49 tokens per second) llama_print_timings: eval time = 0.00 ms / 1 runs ( 0.00 ms per token, inf tokens per second) llama_print_timings: total time = 577.65 ms {"timestamp":1700657519,"level":"INFO","function":"log_server_request","line":1240,"message":"request","remote_addr":"127.0.0.1","remote_port":61340,"status":200,"method":"POST","path":"/tokenize","params":{}} [GIN] 2023/11/22 - 07:51:59 | 200 | 6.506123584s | 127.0.0.1 | POST "/api/generate"

longy2k commented 10 months ago

I am not sure if your issue is specific to the BMO chatbot plugin or your configurations with Ollama.

Can you check if the server is running with any of these other interfaces:

If you did not receive an error and it works fine on these websites but not with BMO, I will start troubleshooting and see if I can mimic the same behavior and resolve it on my end.

EDIT: There may also be a whitespace error when you initially use it (I will resolve that in the next release), try chatting with the model a few times to see if it runs.

twalderman commented 10 months ago

no nothing is working right but if I point my browser to 127.0.0.1:11435, it says ollama is running. I can see BMO connecting to it but the responses are blank

On Wednesday, November 22nd, 2023 at 10:58 AM, Long Huynh @.***> wrote:

Reopened #15.

— Reply to this email directly, view it on GitHub, or unsubscribe. You are receiving this because you authored the thread.Message ID: @.***>

Madd0g commented 10 months ago

hmm... I had a bunch of trouble with ollama too and I did some digging. I was getting JSON.parse errors, not sure if you're seeing similar errors or not @twalderman.

In my case, in a single chunk, there would sometime be more than one JSON object. I figured it must be json-ld. So I changed the code to this (sorry if this isn't like the original, I fixed the shipped bundled version locally)

// around line 4384 of main.js

const chunk = decoder.decode(value, { stream: true }) || "";
// splitting the chunk to parse JSON messages separately
const parts = chunk.split('\n');
for (const part of parts.filter(Boolean)) {
  let parsedChunk;
  try {
    parsedChunk = JSON.parse(part);
  } catch (err) {
    console.error(err);
    console.log('part', part);
    parsedChunk = {response: '{_e_}'};
  }
  const content = parsedChunk.response;
  message += content;
}

for me, it fixed the issue of blank responses.

EDIT: actually, on longer messages (last message that has long context array), they sometimes get split into multiple chunks. so my solution does not handle that.

longy2k commented 9 months ago

@Madd0g

I know you have resolved the JSON.parse errors on your end.

Can you check if v1.7.2 is giving you any errors (especially in a new vault)?

Thanks!

longy2k commented 9 months ago

@twalderman

I have updated Ollama to fetch from /api/chat instead of /api/generate, can you please let me know if Ollama is working for you now?

Thanks

twalderman commented 9 months ago

getting 404 errors

[GIN] 2023/12/11 - 15:35:06 | 404 | 6.709µs | 127.0.0.1 | POST "/api/chat"

longy2k commented 9 months ago

@twalderman

Is it version 0.1.14 (Type ollama -v in terminal)?

Again, if you are not able to run the server on an alternative interface such as: https://ollama.twanluttik.com/

It may be on your end which makes it harder to troubleshoot. I will poke around to see if I can find a solution for you.

twalderman commented 9 months ago

Perhaps it is my setup. I tried another plugin "ChatCBT" and it works as expected on port 11434. https://ollama.twanluttik.com/ is not working either so I will try on a different machine. My M3 is on order so I will take a look at this when it arrives. Thanks Man.

twalderman commented 9 months ago

https://github.com/hinterdupfinger/obsidian-ollama this one seems to work fine. Are you on mac or Linux?

longy2k commented 9 months ago

@twalderman

I set up Ollama to stream the response by default.

I will create an option to turn off the stream so that you do not need to setup an external server.

After that, it should work for you similar to ChatCBT and Obsidian-ollama.

Thanks

twalderman commented 9 months ago

that will work fine for me. i dont know if it is related but I came across this: https://github.com/langchain-ai/langchain/discussions/11544

Madd0g commented 9 months ago

@longy2k thanks, I'll upgrade ollama and try this new mode.

from just looking at the code, if streaming still has the same symptoms - this json fix (split by newline and get the first part) will miss some tokens (see how my fix loops over all split parts). But I haven't tested yet, maybe this isn't a problem anymore.

will update after I try with new ollama.

longy2k commented 9 months ago

@Madd0g

Oh I see! I just tested it again in a new vault and some tokens are missing.

They're only missing for the first response though which is why I may have overlooked it.

I will make sure it loops over all split parts.

Thanks :)

EDIT: Missing tokens do occur every once and awhile (not just the initial response).

Madd0g commented 9 months ago

@longy2k glad you saw it :)

Yes, the very last streaming message from ollama comes with a big json and is usually split into multiple chunks, so this naive approach doesn't handle it (I think there are streaming json libraries that simplify this though). See my code prints {e} on every error, so I see those a lot, so maybe keep a try/catch in the code like I did.

On an unrelated note, another thing I hacked into my local version (which is why I'm slow at testing the new release) is templating support, I know it's a bit of an overkill feature for a tool that doesn't directly target open source llms, but the tool naively prepends "user:" or "assistant:" to messages, while the llms actually work best with very specific templates (<|im_start|>user and such).

I think a config screen for this is an overkill, but if there was a simple way to configure it (maybe in a yaml/json file), that would be great.

I don't think is an absolute necessity (it mostly works without it), but for me it expands the usefulness of the tool into more "professional" use-cases. And I feel it's more "correct" to use the right template.

Madd0g commented 9 months ago

I tested the latest version (that uses /api/chat with ollama), works great so far