justyns / silverbullet-ai

Plug for SilverBullet to integrate LLM functionality
https://ai.silverbullet.md/
GNU Affero General Public License v3.0
26 stars 1 forks source link

AI: Search doesn't work in Online mode #47

Open zefhemel opened 1 month ago

zefhemel commented 1 month ago

This is a bit weird... When I run the "AI: Search" command in Online mode I get:

editor_ui.tsx:146 Error running command Function searchCommand is not available in client

Then, when I switch to Sync mode, it runs but I get no result. But when while on the result page switch back to Online mode, I do get the results...

Although they look a bit weirdly indented:

CleanShot 2024-07-22 at 20 26 28@2x

zefhemel commented 1 month ago

Ah wait I see what you're doing there on the result page. You sub-divided pages (by paragraph?) and are grouping things right? Now that I'm using this on my actual space it makes more sense.

Ok have you considered creating a heading per page, or does that get too large?

Something like

# [[index]]
[[index@222]] Text fragment

[[index@404]] Text fragment 2

I don't think the similarity score is actually very useful to show, I assume you order them by it already?

justyns commented 1 month ago

editor_ui.tsx:146 Error running command Function searchCommand is not available in client

At least this part is because I had env: server on searchCommand. I removed it and it seems to work as expected now. I'm not sure why, but switching between online/sync modes caused it to ignore that env setting?

Then, when I switch to Sync mode, it runs but I get no result. But when while on the result page switch back to Online mode, I do get the results...

This one I'm not sure on. I think I understand the problem though - when in sync mode, the client calls the embedding api and gets the embeddings to search for, and actually does do the search. The problem is the client (in sync mode) has no indexed embeddings because I limited that to the server only 😅

I'll change that to show "No results" or something more obvious, but I'm not sure what the fix is. I thought adding env: server to this would cause it to run on the server, but it doesn't:

  readPageSearchEmbeddings:
    path: src/embeddings.ts:readFileEmbeddings
    env: server
    pageNamespace:
      pattern: "🤖 .+"
      operation: readFile
zefhemel commented 1 month ago

No that's ignored in sync mode, because env is then undefined because it needs to run everything on the server. I'm thinking if there's any technical way to force this to run things on the server even in sync mode, I think there isn't... 🤔

Best you can indeed is detect the environment and give an error, but that's not really a proper solution. Maybe there should be a version of invokeFunction that forces the environment whatever it takes.

Another thing I ran into is performance. With https://github.com/silverbulletmd/silverbullet/pull/965 we now put a 5s limit on e.g. page fetches. On my setup with 1000+ pages indexed, this just times out. Even if we somehow disable this time limit, getting no feedback for 5s+ isn't great. Not sure how, but is there a way to stream or batch results so you avoid timeouts, and get some sort of feedback?

justyns commented 1 month ago

I added a system.invokeFunctionOnSystem syscall that just always uses proxySyscall. I'll do some more testing with it and create a PR to the main silverbullet repo if you think that'd work. It does seem like it works so far.

Another thing I ran into is performance. With https://github.com/silverbulletmd/silverbullet/pull/965 we now put a 5s limit on e.g. page fetches. On my setup with 1000+ pages indexed, this just times out. Even if we somehow disable this time limit, getting no feedback for 5s+ isn't great. Not sure how, but is there a way to stream or batch results so you avoid timeouts, and get some sort of feedback?

Just to make sure, you are hitting that limit on the search page? I do currently get all embeddings from the index before calculating the similarity to them. I can probably split that up and only process 100 or so at a time. I'm not sure how I'd show that as a progress on the page yet, but that would make it nicer

justyns commented 1 month ago

I changed the results page to look like this:

image

I'm also moving all of the search logic out of the pageNamespace readFile function and instead triggering it on an editor:pageLoad event. There might be a better way to do it, but this lets me at least put a "loading" message on screen while waiting.

Next step is to figure out how to handle paginating the indexed embeddings or at least processing them in chunks.

zefhemel commented 1 month ago

Just to make sure, you are hitting that limit on the search page?

Yes, on the search page

justyns commented 1 month ago

@zefhemel can you give it another try when you get a chance? I had to temporarily add a sleep in the loop to test with, but I think the experience should be better now.

zefhemel commented 1 month ago

Now I just get nothing (this is online mode):

# Search results for "Zef"

Searching for "Zef"...
Generating query vector embeddings..

And then nothing. No error on the server either.

Just to be sure I added a new quick note with my name in it, on the server it suggests that it generated embeddings for it, but this search page doesn't find it.

justyns commented 1 month ago

Hmm, immediately after Generating query vector embeddings, it calls the embeddings api, and then you should see Searching for similar embeddings. If there was an error calling the embeddings api, there should be an error in the console though.

Assuming you're using ollama, is it accessible from both your server and client? I think the embeddings api call is going to come from the client.

To test and make sure I didn't have something cached, I just tried these steps:

My SETTINGS looks like this currently in my test space:

indexPage: index
libraries:
# The "Core" library is recommended for all users
- import: "[[!silverbullet.md/Library/Core/*]]"
- import: "[[!ai.silverbullet.md/Library/AICore/*]]"

ai:
  indexEmbeddings: true
  indexEmbeddingsExcludePages:
  - passwords
  indexEmbeddingsExcludeStrings:
  - "**user**:"
  - BLOOP BLARP
  indexSummaryModelName: ollama-gemma2
  indexSummary: false
  chat:
    # bakeMessages: false
    # customEnrichFunctions:
    # - enrichWithURL
  textModels:
    - name: gpt-4o
      provider: openai
      modelName: gpt-4o
  embeddingModels:
    - name: ollama-all-minilm
      modelName: all-minilm
      provider: ollama
      baseUrl: https://ollama.lan.mydomain
      requireAuth: false
zefhemel commented 1 month ago

It's not accessible from the client. So let me look into enabling that first in my setup.

justyns commented 1 month ago

I just pushed an update to display an error on the results page if it can't generate the embeddings, so it should at least give feedback now.

justyns commented 1 month ago

Okay I changed it so embeddings are always generated on the server.

zefhemel commented 1 month ago

So right now I have my main SB instance on a public VPS. Of course with authentication enabled. I also run Ollama as a separate docker container, however not exposed to the internet, just to other containers because it doesn't offer authentication out of the box. This means I will not be able to expose it to the internet unless I deploy some auth later on top of ollama that is compatible with your AI plug in some way.

I'm wondering if this is something to support. It could be done by indeed proxying LLM calls through the server. That is probably doable, except for all the streaming stuff in chat I suppose 🤔

I didn't run into this before, because so far I had only used OpenAI, which is of course perfectly reachable from the client.

justyns commented 1 month ago

I'm wondering if this is something to support. It could be done by indeed proxying LLM calls through the server. That is probably doable, except for all the streaming stuff in chat I suppose 🤔

I think it's worth supporting if possible. Ollama's a pretty simple solution to stand up, and I bet more people will have SB on the same server as ollama.

You can put ollama behind nginx or caddy and require an api key that way, but it's definitely more setup involved.

I haven't actually verified, but I am pretty sure the current way of running calls on the server would wait for the whole response (so no streaming). Maybe the ai plug could have a new http endpoint that the client always calls, and it would just stream the response back to the client without buffering it. Do you think it'd be hard to add support for streaming responses from the new http event listener stuff?