khoj-ai / khoj

Your AI second brain. Self-hostable. Get answers from the web or your docs. Build custom agents, schedule automations, do deep research. Turn any online or local LLM into your personal, autonomous AI (e.g gpt, claude, gemini, llama, qwen, mistral).
https://khoj.dev
GNU Affero General Public License v3.0
14.16k stars 705 forks source link

[FIX] Gracefully handle error case when user-generated data is indexed with two different search models #651

Closed sabaimran closed 8 months ago

sabaimran commented 8 months ago

Describe the bug

The error in this thread was caused by the user having had data indexed with the multilingual model and the default model separately.

To Reproduce

Steps to reproduce the behavior:

  1. Index some data with the default model.
  2. Change model in the settings page.
  3. Index some new data.

On any chat query, it should result in an internal server error.

Stack trace

[2024-02-21 14:13:31 +0000] [126371] [ERROR] Exception in ASGI application
Traceback (most recent call last):
  File "/usr/local/lib/python3.10/dist-packages/django/db/backends/utils.py", line 89, in _execute
    return self.cursor.execute(sql, params)
psycopg2.errors.DataException: different vector dimensions 384 and 768

Potential Fixes

  1. Delete all indexed data when search model is changed.
  2. Maintain state of which search model was used to generate which embeddings. It will still be tricky to stitch together results across different models.

Platform

If self-hosted

Additional context

Add any other context about the problem here.