imtuyethan commented 1 month ago

Problem

I have encountered many issues with the wrong model default settings (incorrect prompt template, the stop words missing, etc.). e.g., comments in Jan 0.5.7 Release Sign Off janhq/jan#3818

Model Testing Results

I have tested 45 models from Jan Hub, here are the results.

Next step

[ ] Update correct default settings for failed models
[ ] Better description for all models
[ ] Consider removing legacy models
[ ] Update Hub with new trending models?

cc @hahuyhoang411

No.	Model Name	Issue Identified
1	Llama 3.2 1B Instruct Q8
2	Llama 3.2 3B Instruct Q8
3	Qwen2.5 7B Instruct Q4
4	Qwen2.5 Coder 7B Instruct Q4
5	Llama 3.1 8B Instruct Q4
6	Qwen2.5 14B Instruct Q4
7	Codestral 22B Q4	Error in response format, wrong prompt template?
8	TinyLlama Chat 1.1B Q4	Garbled response, error in response format
9	LlamaCorn 1.1B Q8
10	Deepseek Coder 1.3B Instruct Q8
11	Gemma 1.1 2B Q4	Error in response format, wrong prompt template?
12	Gemma 2 2B Q4
13	Phi-3 Mini Instruct Q4
14	Stable Zephyr 3B Q8
15	Llama 2 Chat 7B Q4	Error in response format, wrong stop word insertion?
16	CodeNinja 7B Q4	Error in response format, wrong prompt template?
17	LaVa 7B	Garbled response, sometimes cannot run
18	Mistral 7B Instruct Q4	Error in response format, wrong stop word insertion?
19	Noromaid 7B Q4
20	Openchat-3.5 7B Q4
21	Stealth 7B Q4
22	Trinity-v1.2 7B Q4
23	Vistral 7B Q4	Error in response format, wrong stop word insertion?
24	Qwen 2 7B Instruct Q4	Error in response format, wrong prompt template?
25	Qwen Chat 7B Q4
26	Llama 3 8B Instruct Q4
27	Hermes Pro Llama 3 8B Q4
28	Aya 23 8B Q4
29	Gemma 1.1 7B Q4	Error in response format, wrong stop word insertion?
30	BakLlava 1	Garbled response, sometimes cannot run, wrong stop word insertion?
31	Gemma 2 9B Q4
32	LaVa 13B Q4	Garbled response; prompt template issue?
33	Wizard Coder Python 13B Q4	Garbled response; prompt template issue?
34	Phi-3 Medium Instruct Q4
35	Gemma 2 27B Q4
36	Qwen2.5 32B Instruct Q4
37	Deepseek Coder 35B Instruct Q4
38	Phind 34B Q4	Error in response format, wrong stop word insertion?
39	Yi 34B Q4
40	Command-R v01 34B Q4	Garbled response; prompt template issue?
41	Aya 23 35B Q4
42	Mixtral 8x7B Instruct Q4	Error in response format, wrong stop word insertion?
43	Llama 3.1 70B Instruct Q4
44	Llama 2 Chat 70B Q4	Error in response format, wrong stop word insertion?
45	Qwen2.5 72B Instruct Q4

On one note

We will need to develop model.yaml to easily define model capabilities (e.g. function calling, vision, etc). Users are facing an issue with imported LlaVa: https://github.com/janhq/jan/issues/3855

model.yaml should have some sort of capabilities field, e.g. tools: true
Jan allows users to "edit" Models, e.g. view a model's functionalities + edit it
Cortex: users will just edit model.yaml directly

imtuyethan commented 1 month ago

Off topic:

Grammar issue (for all self-imported models by users):

Please change to "Self-imported model by user"
The way we define tags is weird.

Cloud models description could be better

These descriptions are not helpful:

imtuyethan commented 1 month ago

114 (windows-dev-tensorRT-llm) OS: Windows 11 Pro (Version 23H2, build 22631.4037) CPU: AMD Ryzen Threadripper PRO 5955WX (16 cores) RAM: 32 GB GPU: NVIDIA GeForce RTX 3090 Storage: 599 GB local disk (C:)

Codestral 22B Q4:

The response is weird:

https://github.com/user-attachments/assets/5380f2b7-d137-423d-beaa-21d41e33d67f

https://github.com/user-attachments/assets/3d2785d9-5abe-42e2-8dae-0582c883d1c1

imtuyethan commented 1 month ago

Operating System: MacOS Sonoma 14.2 Processor: Apple M2 RAM: 16GB

Model: Tinyllama Chat 1.1B Q4

Seems like wrong prompt template?

With the same prompt, Llama 3.2 1B Instruct Q8 gave me a correct/thorough answer.

imtuyethan commented 1 month ago

Operating System: MacOS Sonoma 14.2 Processor: Apple M2 RAM: 16GB

Gemma 1.1 2B Q4

Wrong prompt template?

imtuyethan commented 1 month ago

Operating System: MacOS Sonoma 14.2 Processor: Apple M2 RAM: 16GB

Llama 2 Chat 7B Q4

Wrong prompt template?

imtuyethan commented 1 month ago

Operating System: MacOS Sonoma 14.2 Processor: Apple M2 RAM: 16GB

CodeNinja 7B Q4

Wrong prompt template?

imtuyethan commented 1 month ago

Operating System: MacOS Sonoma 14.2 Processor: Apple M2 RAM: 16GB

LlaVa 7B

Weird responses:

Reported by user: https://zoom.us/clips/share/riUumJZ0uuzb5vQvZ2eZbMkmOq1nvU7O8VTD5FuBNtxRaO89rp9xA7CibJFCLlGju3nfyLsB_19iPegc0nSM4qxV.POPOcY7WXml_Ab8P

https://github.com/user-attachments/assets/e44274e7-725d-4927-aeb1-8cb03e6831b9

https://github.com/user-attachments/assets/33f2afa1-8f87-4356-bd6f-855a33124eb8

imtuyethan commented 1 month ago

Operating System: MacOS Sonoma 14.2 Processor: Apple M2 RAM: 16GB

Mistral 7B Instruct Q4

Missing stop word?

imtuyethan commented 1 month ago

Operating System: MacOS Sonoma 14.2 Processor: Apple M2 RAM: 16GB

Vistral 7B Q4

Missing stop word?

imtuyethan commented 1 month ago

Operating System: MacOS Sonoma 14.2 Processor: Apple M2 RAM: 16GB

Qwen 2 7B Instruct Q4

Weird format:

imtuyethan commented 1 month ago

Operating System: MacOS Sonoma 14.2 Processor: Apple M2 RAM: 16GB

BakLlava 1

Issue similar as LlaVa 7B

imtuyethan commented 1 month ago

Operating System: MacOS Sonoma 14.2 Processor: Apple M2 RAM: 16GB

Gemma 1.1 7B Q4

Wrong prompt template?

imtuyethan commented 1 month ago

Operating System: MacOS Sonoma 14.2 Processor: Apple M2 RAM: 16GB

LlaVa 13B Q4

Wrong prompt template?

imtuyethan commented 1 month ago

Operating System: MacOS Sonoma 14.2 Processor: Apple M2 RAM: 16GB

Wizard Coder Python 13B Q4

Wrong prompt template?

imtuyethan commented 1 month ago

114 (windows-dev-tensorRT-llm) OS: Windows 11 Pro (Version 23H2, build 22631.4037) CPU: AMD Ryzen Threadripper PRO 5955WX (16 cores) RAM: 32 GB GPU: NVIDIA GeForce RTX 3090 Storage: 599 GB local disk (C:)

Command-R v01 34B Q4

Pretty sure wrong prompt template:

Screenshot 2024-10-22 at 7 42 22 PM

dan-homebrew commented 1 month ago

@imtuyethan I recommending converting the Checklist you have above, into a table so we can track the status/fixing status.

Please work with @hahuyhoang411 - it may be that certain models are unsavable, and we should just remove them from the library.

imtuyethan commented 1 month ago

Device: windows-dev-tensorrt-llm Status: Running Node: 3x-3090s CPU: 1.26% of 16 RAM: 6.06/96 GiB Disk: 600 GiB

Mixtral 8x7B Instruct Q4

imtuyethan commented 1 month ago

Device: windows-dev-tensorrt-llm Status: Running Node: 3x-3090s CPU: 1.26% of 16 RAM: 6.06/96 GiB Disk: 600 GiB

Phind 34B Q4

imtuyethan commented 1 month ago

Device: windows-dev-tensorrt-llm Status: Running Node: 3x-3090s CPU: 1.26% of 16 RAM: 6.06/96 GiB Disk: 600 GiB

Llama 2 Chat 70B Q4

Screenshot 2024-10-22 at 11 53 20 PM

imtuyethan commented 1 month ago

Tasklist

I have QA-ed all models, please check ticket description for the latest update:

[x] Llama 3.2 1B Instruct Q8
[x] Llama 3.2 3B Instruct Q8
[x] Qwen2.5 7B Instruct Q4
[x] Qwen2.5 Coder 7B Instruct Q4
[x] Llama 3.1 8B Instruct Q4
[x] Qwen2.5 14B Instruct Q4
[x] Codestral 22B Q4
[x] TinyLlama Chat 1.1B Q4
[x] LlamaCorn 1.1B Q8
[x] Deepseek Coder 1.3B Instruct Q8
[x] Gemma 1.1 2B Q4
[x] Gemma 2 2B Q4
[x] Phi-3 Mini Instruct Q4
[x] Stable Zephyr 3B Q8
[x] Llama 2 Chat 7B Q4
[x] CodeNinja 7B Q4
[x] LlaVa 7B
[x] Mistral 7B Instruct Q4
[x] Noromaid 7B Q4
[x] Openchat-3.5 7B Q4
[x] Stealth 7B Q4
[x] Trinity-v1.2 7B Q4
[x] Vistral 7B Q4
[x] Qwen 2 7B Instruct Q4
[x] Qwen Chat 7B Q4
[x] Llama 3 8B Instruct Q4
[x] Hermes Pro Llama 3 8B Q4
[x] Aya 23 8B Q4
[x] Gemma 1.1 7B Q4
[x] BakLlava 1
[x] Gemma 2 9B Q4
[x] LlaVa 13B Q4
[x] Wizard Coder Python 13B Q4
[x] Phi-3 Medium Instruct Q4
[x] Gemma 2 27B Q4
[x] Qwen2.5 32B Instruct Q4
[x] Deepseek Coder 33B Instruct Q4
[x] Phind 34B Q4
[x] Yi 34B Q4
[x] Command-R v01 34B Q4
[x] Aya 23 35B Q4
[x] Mixtral 8x7B Instruct Q4
[x] Llama 3.1 70B Instruct Q4
[x] Llama 2 Chat 70B Q4
[x] Qwen2.5 72B Instruct Q4

hahuyhoang411 commented 1 month ago

Current hub contains a lot of outdated models, and some new models have a prompt template bug. Here is my suggestion based on @imtuyethan QA-ed list:

The rationale for this delete list is model has been released >6months will be removed. Delete list:

[ ] TinyLlama Chat 1.1B Q4
[ ] LlamaCorn 1.1B Q8
[ ] Deepseek Coder 1.3B Instruct Q8
[ ] Gemma 1.1 2B Q4 (Only keep Gemma 2 2B Q4)
[ ] Phi-3 Mini Instruct Q4 -> microsoft/Phi-3.5-mini-instruct
[ ] Stable Zephyr 3B Q8
[ ] Llama 2 Chat 7B Q4
[ ] CodeNinja 7B Q4
[ ] Mistral 7B Instruct Q4 -> mistralai/Ministral-8B-Instruct-2410
[ ] Noromaid 7B Q4
[ ] Openchat-3.5 7B Q4
[ ] Stealth 7B Q4 (bye our merge)
[ ] Trinity-v1.2 7B Q4 (bye another merge)
[ ] Vistral 7B Q4
[ ] Qwen 2 7B Instruct Q4
[ ] Qwen Chat 7B Q4
[ ] Llama 3 8B Instruct Q4
[ ] Hermes Pro Llama 3 8B Q4
[ ] Gemma 1.1 7B Q4
[ ] BakLlava 1
[ ] LlaVa 7B -> Llava 1.6
[ ] LlaVa 13B Q4
[ ] Wizard Coder Python 13B Q4
[ ] Phi-3 Medium Instruct Q4
[ ] Deepseek Coder 33B Instruct Q4 -> deepseek-ai/DeepSeek-Coder-V2-Lite-Instruct
[ ] Phind 34B Q4
[ ] Yi 34B Q4
[ ] Mixtral 8x7B Instruct Q4 -> mistralai/Mistral-Small-Instruct-2409
[ ] Llama 2 Chat 70B Q4

Keep list:

LLM:
- Meta:
- [x] Llama 3.2 1B Instruct Q8
- [x] Llama 3.2 3B Instruct Q8
- [x] Llama 3.1 8B Instruct Q4
- [x] Llama 3.1 70B Instruct Q4
- Qwen:
- [x] Qwen2.5 7B Instruct Q4
- [x] Qwen2.5 0.5B Instruct Q4
- [x] Qwen2.5 1.5B Instruct Q4
- [x] Qwen2.5 1.5B Math Q4
- [x] Qwen2.5 1.5B Coder Q4
- [x] Qwen2.5 3B Instruct Q4
- [x] Qwen2.5 Coder 7B Instruct Q4
- [x] Qwen2.5 Math 7B Instruct Q4
- [x] Qwen2.5 14B Instruct Q4
- [x] Qwen2.5 32B Instruct Q4
- [x] Qwen2.5 72B Instruct Q4
- Google:
- [x] Gemma 2 2B Q4
- [x] Gemma 2 9B Q4
- [x] Gemma 2 27B Q4
- Cohere:
- [x] Command-R v01 34B Q4
- [x] Aya 23 8B Q4
- [x] Aya 23 35B Q4
- [x] Aya Expanse 8B Q4
- [x] Aya Expanse 32B Q4
- Mistral:
- [x] Codestral 22B Q4
- [x] Ministral-8B-Instruct-2410 (new)
- [x] Mistral-Small-Instruct-2409 (new)
- [ ] Mistral-Large-Instruct-2407 (new)
- Deepseek:
- [ ] DeepSeek-Coder-V2-Lite-Instruct (new)
- Microsoft:
- [x] Phi-3.5-mini-instruct (new)
- NVIDIA:
- [ ] Llama-3.1-Nemotron-70B-Instruct-HF (new)
- IBM:
- [x] Granite-3.0 3B (new)
- [x] Granite-3.0 8B (new)
VLM: VLMs are a bit more tricky LLava 1.6 (new) Qwen2-VL-7B-Instruct (new) Pixtral-12B-2409 (new) Llama-3.2-11B-Vision-Instruct (new) GOT-OCR2_0 (new) Molmo-7B-D-0924 (new) MiniCPM-V-2_6 (new)

imtuyethan commented 1 month ago

@hahuyhoang411 Should we add more new/trending models? The list seems short for a whole model hub.

Some edge cases we need to handle:

We can delete them from Hub, but they still show up on the users' side if they have downloaded these legacy models. How do we inform them when these models don't work?

janhq / models

bug: Fix, update & improve models in Jan Hub #46

Problem

Model Testing Results

Next step

On one note

Grammar issue (for all self-imported models by users):

Cloud models description could be better

114 (windows-dev-tensorRT-llm) OS: Windows 11 Pro (Version 23H2, build 22631.4037) CPU: AMD Ryzen Threadripper PRO 5955WX (16 cores) RAM: 32 GB GPU: NVIDIA GeForce RTX 3090 Storage: 599 GB local disk (C:)

Codestral 22B Q4:

Model: Tinyllama Chat 1.1B Q4

Operating System: MacOS Sonoma 14.2 Processor: Apple M2 RAM: 16GB

Gemma 1.1 2B Q4

Operating System: MacOS Sonoma 14.2 Processor: Apple M2 RAM: 16GB

Llama 2 Chat 7B Q4

Operating System: MacOS Sonoma 14.2 Processor: Apple M2 RAM: 16GB

CodeNinja 7B Q4

Operating System: MacOS Sonoma 14.2 Processor: Apple M2 RAM: 16GB

LlaVa 7B

Operating System: MacOS Sonoma 14.2 Processor: Apple M2 RAM: 16GB

Mistral 7B Instruct Q4

Operating System: MacOS Sonoma 14.2 Processor: Apple M2 RAM: 16GB

Vistral 7B Q4

Operating System: MacOS Sonoma 14.2 Processor: Apple M2 RAM: 16GB

Qwen 2 7B Instruct Q4

Operating System: MacOS Sonoma 14.2 Processor: Apple M2 RAM: 16GB

BakLlava 1

Operating System: MacOS Sonoma 14.2 Processor: Apple M2 RAM: 16GB

Gemma 1.1 7B Q4

Operating System: MacOS Sonoma 14.2 Processor: Apple M2 RAM: 16GB

LlaVa 13B Q4

Operating System: MacOS Sonoma 14.2 Processor: Apple M2 RAM: 16GB

Wizard Coder Python 13B Q4

114 (windows-dev-tensorRT-llm) OS: Windows 11 Pro (Version 23H2, build 22631.4037) CPU: AMD Ryzen Threadripper PRO 5955WX (16 cores) RAM: 32 GB GPU: NVIDIA GeForce RTX 3090 Storage: 599 GB local disk (C:)

Command-R v01 34B Q4

Device: windows-dev-tensorrt-llm Status: Running Node: 3x-3090s CPU: 1.26% of 16 RAM: 6.06/96 GiB Disk: 600 GiB

Mixtral 8x7B Instruct Q4

Device: windows-dev-tensorrt-llm Status: Running Node: 3x-3090s CPU: 1.26% of 16 RAM: 6.06/96 GiB Disk: 600 GiB

Phind 34B Q4

Device: windows-dev-tensorrt-llm Status: Running Node: 3x-3090s CPU: 1.26% of 16 RAM: 6.06/96 GiB Disk: 600 GiB

Llama 2 Chat 70B Q4

Tasklist

Some edge cases we need to handle: