janhq / jan

Jan is an open source alternative to ChatGPT that runs 100% offline on your computer. Multiple engine support (llama.cpp, TensorRT-LLM)
https://jan.ai/
GNU Affero General Public License v3.0
23.89k stars 1.39k forks source link

epic: Jan has Conversation-based RAG #1076

Closed dan-homebrew closed 10 months ago

dan-homebrew commented 11 months ago

Objectives

Leads

User Stories

In Scope

  1. As a User, I want to upload text files to the chat:

    • Scenario: When I have a text-based file (PDF, doc, etc.), I can upload it directly within the chat interface.
    • Acceptance Criteria: I should be able to select a file from my device and upload it to the chat window.
  2. As a User, I want to view the uploaded file's content:

    • Scenario: Upon uploading a text file, I want to view its content displayed within the chat.
    • Acceptance Criteria: The chat interface should visually represent the uploaded file's content, making it accessible alongside the conversation.
  3. As a User, I want to ask questions related to the uploaded file:

    • Scenario: After uploading a file, I want to input prompts or questions about its content within the chat.
    • Acceptance Criteria: The chat interface should allow me to type prompts or questions, linking them contextually to the specific file content for relevant responses.
  4. As a User, I want to receive responses based on file-specific queries:

    • Scenario: When I input queries related to the uploaded file, I expect relevant and contextual responses within the chat.
    • Acceptance Criteria: The system should process my queries about the uploaded file, providing accurate and appropriate responses in the conversation thread.
  5. As a User, I understand the limitation of multiple file uploads:

    • Scenario: When attempting to upload multiple files simultaneously, I should receive information about this limitation.
    • Acceptance Criteria: The system should display notifications or error messages, informing me that only one file can be uploaded at a time.

Out-of-Scope

Design Wireframes

Figma link: https://www.figma.com/file/ytn1nRZ17FUmJHTlhmZB9f/Jan-App?type=design&node-id=783-43738&mode=design&t=7KYGjHy7F1RvqEip-4

Engineering & Architecture

In Scope

Out-of-Scope

Tasklist

Resources

https://www.chatpdf.com/c/vzHhtas3uQVZDK9ZGglaw

Out of scope

imtuyethan commented 11 months ago

Archive @dan-jan's original comment because i need to input my product specs on top for subtask to work on github:

Spec

Appendix

Why not /files approach?

imtuyethan commented 11 months ago

Storage: /threads /thread-1 /files > my.pdf /thread-2 Future iteration: we symlink so files are not duped. :plus:add a design that shows when models do not support RAG

Right panel: In Assistant section Under category called "Tools" Have a checked in [x] checkbox for File Retrieval In the future this section will have web search and other tools

Similar to OpenAi's createGPT flow

hiro-v commented 10 months ago

Eng spec

image

There are 3 scenarios:

Communication layers:

Tools of choice:

What to do next even after this

hiro-v commented 10 months ago

Questions and Answers:

  1. Where are we being opinionated about and WHY, i.e. our choice of hsnw, langchain, no llamaindex => Answer:

    • Using langchain and llama_index at the moment is an opinionated choice that Hiro made because of:
    • We are not dependent on any existing libs, but we need an abstraction layer for vdb, pre-processing steps (e.g Text splitter) that we do not want to re-invent the wheel
    • langchain.js is more actively developed than llama_index TS at the time we are developing.
    • Choosing hsnw is an opinionated option too as it's the most lightweight and highly compatible version that can be embedded in any OS/ CPU of choices
  2. What abstractions need to happen in the future, to allow for a bring ur own vdb situation => Answer:

    • Hiro think yes, absolutely, that's why I choose to use langchain.js to abstract the interface.
  3. Any "hacky" solutions employed to get things to work for now => Answer

  4. Impact on user disk / Jan Folder / resource hogging => Answer:

    • The files save in `jan/threads//.extension
    • The memory as files saved in `jan/threads//memory/** (this on is packageable)
    • Once the memory is there, newly ingested file will be appended.
  5. Where are eng specs? https://github.com/janhq/jan/issues/1076#issuecomment-1899553830

  6. Will it be available via the local api server? => Answer: Yes, but this one we have not thought through at the moment. However, this one will be designed similar to OpenAI GPTs runs

  7. How are we chunking?

    • There are 2 parameters in TextSplitter with text only is Chunking and Overlap. We set it by default at a fixed number but will let user to configure in the settings (thread level)
  8. How is llm map-reducing across similar vectors? Is that configurable by the user? => Answer

    • text -> embedding -> similaritySearch (top-k) -> rerank.
    • User can configure this settings
    • To be updated
  9. If the user uses a different embeddings layer (model A) for doc ingestion vs user queries (model B); our current approach seems hyper opinionated.

    • There are 2 models: Embedding for retrieval, and LLM for text-generation.
    • Can use LLM for retrieval task as well.
    • For the current implementation that nitro served LLM plays both roles, then if user change the model, they need to ingest again. One way to avoid this is to disable Changing model in mid-thread
    • The likely way is to split these models into 2 models, in which the embedding model does not always change. 1 way is adding https://github.com/FFengIll/embedding.cpp for serving sentence transformer, bge or even write it as node-gyp to use inside Jan alone.
hiro-v commented 10 months ago

@louis-jan point on framework layer:

hiro-v commented 10 months ago

TODO:

@alan

--- Something just work at the moment --- Improved version