janhq / jan

Jan is an open source alternative to ChatGPT that runs 100% offline on your computer. Multiple engine support (llama.cpp, TensorRT-LLM)
https://jan.ai/
GNU Affero General Public License v3.0
22.74k stars 1.31k forks source link

planning: Provider Abstraction #3786

Open dan-homebrew opened 4 days ago

dan-homebrew commented 4 days ago

Goal

Tasklist

dan-homebrew commented 4 days ago

Goal: Clear Eng Spec for Providers

Scope

Related

louis-jan commented 3 days ago

Jan Providers

Local Provider

Currently, the local extension still has to manage processes itself, which involves utilizing third-party frameworks such as Node.js (child_process) for building functions.

What if we build Jan on mobile we have to cover extensions as well. It would be better to move these parts to Core module and frontend will just need to use it’s API.

Local Provider will need to execute a command to run its program. Therefore, the command and arguments will be defined, while the rest will be delegated to the super class.

Lifecycle:

Examples

class CortexProvider extends LocalProvider {
  async onLoad() {
     // The Run is implemented from the core module
     // then the spawn process will be maintained by the watchdog
     this.run("cortex", [ "start", "--port", "39291" ], { cwd: "./", env: { } })
  } 

  async loadModel() {
    // Can be a http request, socket or grpc
    this.post("/v1/model/start", { mode: "llama3.2" })
  }
}

Image

https://drive.google.com/file/d/1lITgfqviqA5b0-etSGtU5wI8BS7_TXza/view?usp=sharing

Remove Provider

Remote Provider Extension
Image
Draw.io https://drive.google.com/file/d/1pl9WjCzKl519keva85aHqUhx2u0onVf4/view?usp=sharing
  1. Supported parameters?
    • Each provider works with different parameters, but they all share the same basic function with the current ones defined.
    • We've already supported transformPayload and transformResponse to adapt to these cases.
    • So users still see parameters consistent from model to model, but the magic happens behind the scenes, where the transformations are simplified under the hood.
      
      /**
    • transformPayload Example
    • Tranform the payload before sending it to the inference endpoint.
    • The new preview models such as o1-mini and o1-preview replaced max_tokens by max_completion_tokens parameter.
    • Others do not. */ transformPayload = (payload: OpenAIPayloadType): OpenAIPayloadType => { // Transform the payload for preview models if (this.previewModels.includes(payload.model)) { const { max_tokens, ...params } = payload return { ...params, max_completion_tokens: max_tokens } } // Pass through for officialw models return payload }
  2. Decoration?
    • We've currently hard-coded many provider metadata from Jan, which could cause issues with future installed extensions.
    • The decoration should be done from the Extension Manifest (package.json).
    • https://code.visualstudio.com/api/references/extension-manifest
      {
      "name": "openai-extension",
      "displayName": "OpenAI Extension Provider",
      "icon": "https://openai.com/logo.png"
      }
  3. Just remove the hacky parts
    • Model Dropdown: It checks if the engine is nitro or others, filtering for local versus cloud sections. New local engines will be treated as remote engines (e.g. cortex.cpp). -> Filter by Extension type (class name or type, e.g. LocalOAIEngine vs RemoteOAIEngine).
    • All models from the cloud provider are disabled by default if no API key is set. What if I use a self-hosted endpoint without API key restrictions? Models available or not should be determined from the extensions, when there are no credentials to meet the requirements, it will result in an empty section, indicating no available models. When users input the API-Key from extension settings page, it will fetch model list automatically and cache. Users can also refresh the models list from there (should not fetch so many times, we are building a local-first application)
    • Application settings can be a bit confusing, with Model Providers and Core Extensions listed separately. Where do other extensions fit in?
Extension settings do not have a community or "others" section
Image

Provider Interface and abstraction

Registered models will be stored in an in-memory store, accessible from other extensions(ModelManager.instance().models). The same as settings. App and extensions can perform chat/completions requests with just model name, which means the registered model should be unique across extensions.

The core module also exposes extensive APIs, such as systemStatus so other extensions can access, there should be just one implementation of the logic supplied by extensions. Otherwise, it will merely be utilized within the extension, first come, first served.

The UI of the model should be aligned with the model object, minimize decorations (e.g. model icon), and avoid introducing various types of model DTOs.

Each Provider Extension should be a separate repo?

Extensions installation is a straightforward process that requires minimal effort.

dan-homebrew commented 2 days ago

@louis-jan We can start working on this refactor, and make adjustments on the edges. Thank you for the clear spec!