bug: `cortex` `run` or `pull` redownloads existing model multiple times

gabrielle-ong commented 1 month ago

Cortex version

cortex run redownloads existing model multiple times

Describe the Bug

2 issues (see screenshot)

tinyllama:gguf is already downloaded
cortex run tinyllama:gguf is successful
Issue 1: cortex run tinyllama -> Select gguf initiates downloading model.gguf
Issue 2: Re-download model? N -> re-downloads model again (twice)

Steps to Reproduce

No response

Screenshots / Logs

What is your OS?

[X] MacOS
[ ] Windows
[ ] Linux

What engine are you running?

[X] cortex.llamacpp (default)
[ ] cortex.tensorrt-llm (Nvidia GPUs)
[ ] cortex.onnx (NPUs, DirectML)

gabrielle-ong commented 1 month ago

Also seen with cortex pull

gabrielle-ong commented 1 month ago

Still happening on v123

gabrielle-ong commented 1 month ago

Probably also linked to:

continue download? [Y/n] n -> I expect it not to download the model, But it retriggers download

nguyenhoangthuan99 commented 1 month ago

Related PR #1361

When chose not to re-download/continue download, disable log downloaded successfully

gabrielle-ong commented 1 month ago

@nguyenhoangthuan99 v-129: Redownload - No - is still triggering the download, I dont get Cancelled re-download!

nguyenhoangthuan99 commented 1 month ago

This logic I confirmed with @namchuai, with continue download feature, there are 3 options:

Y/y : continue download
N : stop continue download and start downloading whole binary from beginning
Ctrl + C: Stop download process

gabrielle-ong commented 1 month ago

@dan-homebrew - unexpected behaviour you encountered as well.

My inputs for consideration: As a user I would have expected n to stop the download process (eg dont want to use my limited data)

possibly 3 flags? (these are just semantics): [Y/n/restart]

namchuai commented 1 month ago

Usually, when I'm using CLI, if I want to stop foreground process, my go to is Ctrl C. However, I can't say for all users. Please confirm the way you found it's natural, and we will update it accordingly.

dan-homebrew commented 1 month ago

@namchuai @vansangpfiev I am re-opening this issue, as I think this is a Day 0 UX issue that we should resolve:

Current Problem

I have downloaded tinyllama:gguf
When I run cortex run tinyllama, it prompts me to choose versions (from Hub)
When I choose 1. gguf, it asks me if I want to re-download it

From the user's perspective, this is annoying:

I have already downloaded a model
Cortex should default to the already-downloaded local model

What I was expecting was something like this:

> cortex-nightly run tinyllama
Searching local models... found `tinyllama:gguf`
Running `tinyllama:gguf`...
tinyllama:gguf started successfully

Proposed Solution

My goal is to simplify cortex run to minimize user input for the happy path:

Current

This is the current cortex run logic:

cortex start
models pull (if model is not existed) <- this logic
engines install if engine is not existed
models start

Improvement

I would like to expand on (1), i.e. model pull:

Check if model is present in local models
Case 1: one local model matched: run local model immediately (no need for user input)
Case 2: >1 local model matched: present "menu"
Case 3: no local model matched: present "menu"

The "menu" should differentiate "Local" and "Available for Download":

> cortex-nightly run tinyllama
Local Models:
    1. gguf
    2. 1b-gguf

Available to Download:
    3. 7b-gguf
    4. 7b-tensorrt-llm

Select model to download: 1

namchuai commented 1 month ago

I think this make sense. I think we should apply for cortexso models first.

gabrielle-ong commented 1 month ago

Solved in #1418, marking as complete

janhq / cortex.cpp