planning: Migrate Threads, Messages to Cortex, deprecates Conversation Extension

dan-homebrew commented 4 weeks ago

Goal

Jan shifts Conversations state to Cortex for management
Deprecate Conversations Extension

Tasklist

[ ] https://github.com/janhq/cortex.cpp/issues/1567
[ ] Jan migration and extensive QA testing

louis-jan commented 4 weeks ago

According to this:

https://github.com/janhq/cortex.cpp/issues/1567#issuecomment-2444740659

## Problems `/messages` is quite straightforward for now but Jan's `/threads` are a combination of `model preset, assistant parameters, assistant tools and threads`. Also `/assistants` is not well designed, it defaults to a hard-coded template. See a Jan `thread.json` example: ```json { "id": "jan_1729768043", "object": "thread", "title": "0.5.8 llama 3.2 1b", "assistants": [ { "assistant_id": "jan", "assistant_name": "Jan", "tools": [ { "type": "retrieval", "enabled": true, "settings": { "top_k": 2, "chunk_size": 1024, "chunk_overlap": 64, "retrieval_template": "Use the following pieces of context to answer the question at the end. If you don't know the answer, just say that you don't know, don't try to make up an answer.\n----------------\nCONTEXT: {CONTEXT}\n----------------\nQUESTION: {QUESTION}\n----------------\nHelpful Answer:" } } ], "model": { "id": "llama3.2-1b-instruct", "settings": { "engine": "llama-cpp", "ctx_len": 3072, "ngl": 100, "prompt_template": "<|start_header_id|>system<|end_header_id|>\n\n{system_message}<|eot_id|><|start_header_id|>user<|end_header_id|>\n\n{prompt}<|eot_id|><|start_header_id|>assistant<|end_header_id|>\n\n", "text_model": false }, "parameters": { "engine": "llama-cpp", "frequency_penalty": 0, "max_tokens": 3072, "presence_penalty": 0, "stop": [ "<|eot_id|>" ], "stream": true, "temperature": 0.699999988079071, "top_p": 0.949999988079071 }, "engine": "llama-cpp" }, "instructions": "" } ], "created": 1729768043312, "updated": 1730195853233, "metadata": { "lastMessage": "Hello!" } } ``` See OpenAI Assistant and Thread: ```json { "id": "asst_abc123", "object": "assistant", "created_at": 1698984975, "name": "Math Tutor", "description": null, "model": "gpt-4o", "instructions": "You are a personal math tutor. When asked a question, write and run Python code to answer the question.", "tools": [ { "type": "code_interpreter" } ], "metadata": {}, "top_p": 1.0, "temperature": 1.0, "response_format": "auto" } ``` ``` { "id": "thread_abc123", "object": "thread", "created_at": 1699012949, "metadata": {}, "tool_resources": {} } ``` ## So should we: 1. Introduce a new structure similar to an existing one and scoped by `/threads` and `/messages` 2. Follow a popular schema such as OpenAI that could scale to `/assistants` I think 2 is preferred since we could take advantage of existing test suites and client SDKs. Otherwise, we would eventually do another migration to scale to `/assistants` and double the workload, such as writing tests. ## Decouple `threads` & `/models` Currently, they are coupled and fairly similar to preset, which is not really well-defined. E.g. thread.json defines model settings, which created a side effect where switching between threads would also reload the model. It's an antipattern, and we should find a way to decouple it. 1. Inference parameters & tools go to `/assistants`. It's to scale `/assistants` better where users can have more than one assistant persona (instructions + parameters) instead of hard coding. 2. Model parameters go to `/models` where PUT takes effect (now it's used nowhere) 3. The thread is now fairly thin. Better to scale to `/run` as well, it is a likely a container that glue components together (assistant, run, file_stores)

There would be many conclusions that affect Jan's UX such as:

Threads are now coupled with model settings, which introduces a bad UX where users get their model restarted every time they switch to a new thread, even with the same model.

Moving model configurations to per-model settings would be beneficial. Those settings have a global affect.
Assistants are clearly defined. Where users can have more than one assistant persona (instructions + parameters).

As a new user to this space, it's quite hard to get thread's parameters and settings. The Assistant Personas (instructions and parameters) and Model Capability Settings (more about hardware explanations) would help onboard users better.

dan-homebrew commented 4 weeks ago

As a new user to this space, it's quite hard to get thread's parameters and settings. The Writing Assistant Persona (instructions and parameters) and Model Capability Settings (more about hardware explanations) would help onboard users better.

Can you elaborate a bit more about:

Writing Assistant Persona: is this an Assistant?
Model Capability Settings: is this a FAQ? Or an assistant meant to teach the user how to use Jan Settings?

louis-jan commented 4 weeks ago

ah @dan-homebrew I just mean

Thread's Inference Parameters such as temperature, frequency penalty, presence penalty are quite incomprehensible. Move those to Assistant would make building an assistant persona easier to get.
Modifying Thread's settings parameters, such as context window and ngl, cause a bad UX. Move to per-model settings might help. From there we add more hardware detection information such as the recommended GPU layers load and context length based on their device specs -> Global effect per model, not per thread.

dan-homebrew commented 4 weeks ago

ah @dan-homebrew I just mean

Thread's Inference Parameters such as temperature, frequency penalty, presence penalty are quite incomprehensible. Move those to Assistant would make building an assistant persona easier to get.

Modifying Thread's settings parameters, such as context window and ngl, cause a bad UX. Move to per-model settings might help. From there we add more hardware detection information such as the recommended GPU layers load and context length based on their device specs -> Global effect per model, not per thread.

Got it. Can you proceed to make the recommendations for how we can break down the Assistants, Threads/Messages, and Models endpoints (and the related data structures).

Models:
- Need to recommend changes to current model.yaml and Models table
- Do we need to implement model presets? (I don't think so?)
Threads/Messages: https://github.com/janhq/cortex.cpp/issues/1567
Assistants: https://github.com/janhq/cortex.cpp/issues/1573

I think it's better we bite the bullet and move to the correct data structures.

janhq / jan

planning: Migrate Threads, Messages to Cortex, deprecates Conversation Extension #3904

Goal

Tasklist