dan-homebrew commented 11 months ago

Objectives

Allow users to upload text files (PDFs, docs, etc.) within the chat interface.
Deliver a user-friendly file upload mechanism, enabling seamless integration into the chat environment.

Leads

Product: @imtuyethan
Engineering: @hiro-v

User Stories

In Scope

As a User, I want to upload text files to the chat:
- Scenario: When I have a text-based file (PDF, doc, etc.), I can upload it directly within the chat interface.
- Acceptance Criteria: I should be able to select a file from my device and upload it to the chat window.
As a User, I want to view the uploaded file's content:
- Scenario: Upon uploading a text file, I want to view its content displayed within the chat.
- Acceptance Criteria: The chat interface should visually represent the uploaded file's content, making it accessible alongside the conversation.
As a User, I want to ask questions related to the uploaded file:
- Scenario: After uploading a file, I want to input prompts or questions about its content within the chat.
- Acceptance Criteria: The chat interface should allow me to type prompts or questions, linking them contextually to the specific file content for relevant responses.
As a User, I want to receive responses based on file-specific queries:
- Scenario: When I input queries related to the uploaded file, I expect relevant and contextual responses within the chat.
- Acceptance Criteria: The system should process my queries about the uploaded file, providing accurate and appropriate responses in the conversation thread.
As a User, I understand the limitation of multiple file uploads:
- Scenario: When attempting to upload multiple files simultaneously, I should receive information about this limitation.
- Acceptance Criteria: The system should display notifications or error messages, informing me that only one file can be uploaded at a time.

Out-of-Scope

As a user, I want to see prompts suggestions based on the capabilities of the model.
As a user, I want to attach many files at the same time.
As a user, I want to attach many other file formats like audio,...

Design Wireframes

Figma link: https://www.figma.com/file/ytn1nRZ17FUmJHTlhmZB9f/Jan-App?type=design&node-id=783-43738&mode=design&t=7KYGjHy7F1RvqEip-4

Engineering & Architecture

In Scope

List of supported sources: Parsable binary file with extensions: .pdf, .docx

Out-of-Scope

List of unsupported files: others

Tasklist

[x] #1126
[x] #1335 @urmauur
[x] Assistant using event base communication (Event.on and Event.emit)
[x] Message request refactoring (wrong format, OpenAI does not work with request body) - @louis-jan
[x] Fix thread history (could not retrieve messages somehow) - @louis-jan
[x] Query ingested documents from inference extensions (currently hard coded the response from assistant, no ingestion) - @louis-jan
[ ] Add indices for new documents if the memory folder exists - @hiro-v
[x] Add retrieval settings in UI for user to change

Resources

https://www.chatpdf.com/c/vzHhtas3uQVZDK9ZGglaw

Out of scope

imtuyethan commented 11 months ago

Archive @dan-jan's original comment because i need to input my product specs on top for subtask to work on github:

Spec

WIP
Conversation-based RAG
Parity with GPT4, where they can upload a PDF and ask questions

Appendix

Why not /files approach?

Jan can change models
We would need to reindex embeddings every time there is a model switch?

imtuyethan commented 11 months ago

Storage: /threads /thread-1 /files > my.pdf /thread-2 Future iteration: we symlink so files are not duped. :plus:add a design that shows when models do not support RAG

Right panel: In Assistant section Under category called "Tools" Have a checked in [x] checkbox for File Retrieval In the future this section will have web search and other tools

Similar to OpenAi's createGPT flow

hiro-v commented 10 months ago

Eng spec

There are 3 scenarios:

Chat with assistant, toggle for tools/retrieval off => Chat normally.
Chat with assistant, toggle for tools/retrieval on and upload file as reference (parsable PDF atm) => Ingestion phase
Chat with assistant, toggle for tools/retrieval on and query => Query phase

Communication layers:

Model (nitro extension/ openai extension) currently uses event based processing: on/ emit
Assistant refactored from function-based to event based processing: on/ emit as well

Tool retrieval designed in a way that:

It can be toggled on/ off and acts as middleware.

Tool can have node/ browser runtime. For retrieval, it's node runtime

extensions/assistant-extension/src
├── @types
│   └── global.d.ts
├── index.ts
└── node
├── index.ts
└── tools
└── retrieval
    └── index.ts

Tool/ retrieval can accept multiple input format (web content for web browsing/ multiple files format)

Tool/ retrieval settings can be configured on right hand side, thread-based settings. The global one is stored in jan/assistants/jan/assistant.json as follow:

{
"avatar": "",
"id": "jan",
"object": "assistant",
"created_at": 1705549969445,
"name": "Jan",
"description": "A default assistant that can use all downloaded models",
"model": "*",
"instructions": "",
"tools": [
{
"type": "retrieval",
"enabled": true,
"settings": {}
}
],
"file_ids": []
}

Tools of choice:

VectorStore: HNSW binding in node
Embedding:
- Will be likely to use nitro served embedding instead for BGE/ sentencetransformer (BERT based models)

What to do next even after this

Fine-tuned embedding models

hiro-v commented 10 months ago

Questions and Answers:

Where are we being opinionated about and WHY, i.e. our choice of hsnw, langchain, no llamaindex => Answer:
- Using langchain and llama_index at the moment is an opinionated choice that Hiro made because of:
- We are not dependent on any existing libs, but we need an abstraction layer for vdb, pre-processing steps (e.g Text splitter) that we do not want to re-invent the wheel
- langchain.js is more actively developed than llama_index TS at the time we are developing.
- Choosing hsnw is an opinionated option too as it's the most lightweight and highly compatible version that can be embedded in any OS/ CPU of choices
What abstractions need to happen in the future, to allow for a bring ur own vdb situation => Answer:
- Hiro think yes, absolutely, that's why I choose to use langchain.js to abstract the interface.
Any "hacky" solutions employed to get things to work for now => Answer
Impact on user disk / Jan Folder / resource hogging => Answer:
- The files save in `jan/threads//.extension
- The memory as files saved in `jan/threads//memory/** (this on is packageable)
- Once the memory is there, newly ingested file will be appended.
Where are eng specs? https://github.com/janhq/jan/issues/1076#issuecomment-1899553830
Will it be available via the local api server? => Answer: Yes, but this one we have not thought through at the moment. However, this one will be designed similar to OpenAI GPTs runs
How are we chunking?
- There are 2 parameters in TextSplitter with text only is Chunking and Overlap. We set it by default at a fixed number but will let user to configure in the settings (thread level)
How is llm map-reducing across similar vectors? Is that configurable by the user? => Answer
- text -> embedding -> similaritySearch (top-k) -> rerank.
- User can configure this settings
- To be updated
If the user uses a different embeddings layer (model A) for doc ingestion vs user queries (model B); our current approach seems hyper opinionated.
- There are 2 models: Embedding for retrieval, and LLM for text-generation.
- Can use LLM for retrieval task as well.
- For the current implementation that nitro served LLM plays both roles, then if user change the model, they need to ingest again. One way to avoid this is to disable Changing model in mid-thread
- The likely way is to split these models into 2 models, in which the embedding model does not always change. 1 way is adding https://github.com/FFengIll/embedding.cpp for serving sentence transformer, bge or even write it as node-gyp to use inside Jan alone.

hiro-v commented 10 months ago

@louis-jan point on framework layer:

Extension is not lightweight anymore (now assistant extension .tar.gz is 100MB) => Should be reused. ===> retrieval extension

hiro-v commented 10 months ago

TODO:

Very clear engineering specs - @hiro-v (+ @louis-jan @dan-jan )

@alan

RAG into 2 steps:

--- Something just work at the moment --- Improved version

janhq / jan

epic: Jan has Conversation-based RAG #1076

Objectives

Leads

User Stories

Design Wireframes

Engineering & Architecture

Tasklist

Resources

Out of scope

Spec

Appendix