janhq / models

Models support in Jan and Cortex
MIT License
5 stars 2 forks source link

bug: Fix, update & improve models in Jan Hub #46

Open imtuyethan opened 1 month ago

imtuyethan commented 1 month ago

Problem

I have encountered many issues with the wrong model default settings (incorrect prompt template, the stop words missing, etc.). e.g., comments in Jan 0.5.7 Release Sign Off janhq/jan#3818


Model Testing Results

I have tested 45 models from Jan Hub, here are the results.

Next step

cc @hahuyhoang411

No. Model Name Issue Identified Status
1 Llama 3.2 1B Instruct Q8
2 Llama 3.2 3B Instruct Q8
3 Qwen2.5 7B Instruct Q4
4 Qwen2.5 Coder 7B Instruct Q4
5 Llama 3.1 8B Instruct Q4
6 Qwen2.5 14B Instruct Q4
7 Codestral 22B Q4 Error in response format, wrong prompt template?
8 TinyLlama Chat 1.1B Q4 Garbled response, error in response format
9 LlamaCorn 1.1B Q8
10 Deepseek Coder 1.3B Instruct Q8
11 Gemma 1.1 2B Q4 Error in response format, wrong prompt template?
12 Gemma 2 2B Q4
13 Phi-3 Mini Instruct Q4
14 Stable Zephyr 3B Q8
15 Llama 2 Chat 7B Q4 Error in response format, wrong stop word insertion?
16 CodeNinja 7B Q4 Error in response format, wrong prompt template?
17 LaVa 7B Garbled response, sometimes cannot run
18 Mistral 7B Instruct Q4 Error in response format, wrong stop word insertion?
19 Noromaid 7B Q4
20 Openchat-3.5 7B Q4
21 Stealth 7B Q4
22 Trinity-v1.2 7B Q4
23 Vistral 7B Q4 Error in response format, wrong stop word insertion?
24 Qwen 2 7B Instruct Q4 Error in response format, wrong prompt template?
25 Qwen Chat 7B Q4
26 Llama 3 8B Instruct Q4
27 Hermes Pro Llama 3 8B Q4
28 Aya 23 8B Q4
29 Gemma 1.1 7B Q4 Error in response format, wrong stop word insertion?
30 BakLlava 1 Garbled response, sometimes cannot run, wrong stop word insertion?
31 Gemma 2 9B Q4
32 LaVa 13B Q4 Garbled response; prompt template issue?
33 Wizard Coder Python 13B Q4 Garbled response; prompt template issue?
34 Phi-3 Medium Instruct Q4
35 Gemma 2 27B Q4
36 Qwen2.5 32B Instruct Q4
37 Deepseek Coder 35B Instruct Q4
38 Phind 34B Q4 Error in response format, wrong stop word insertion?
39 Yi 34B Q4
40 Command-R v01 34B Q4 Garbled response; prompt template issue?
41 Aya 23 35B Q4
42 Mixtral 8x7B Instruct Q4 Error in response format, wrong stop word insertion?
43 Llama 3.1 70B Instruct Q4
44 Llama 2 Chat 70B Q4 Error in response format, wrong stop word insertion?
45 Qwen2.5 72B Instruct Q4

On one note

We will need to develop model.yaml to easily define model capabilities (e.g. function calling, vision, etc). Users are facing an issue with imported LlaVa: https://github.com/janhq/jan/issues/3855

imtuyethan commented 1 month ago

Off topic:

Grammar issue (for all self-imported models by users):

Screenshot 2024-10-16 at 11 58 36 PM

Cloud models description could be better

These descriptions are not helpful:

Screenshot 2024-10-17 at 12 04 57 AM Screenshot 2024-10-17 at 12 04 32 AM
imtuyethan commented 1 month ago

114 (windows-dev-tensorRT-llm) OS: Windows 11 Pro (Version 23H2, build 22631.4037) CPU: AMD Ryzen Threadripper PRO 5955WX (16 cores) RAM: 32 GB GPU: NVIDIA GeForce RTX 3090 Storage: 599 GB local disk (C:)

Codestral 22B Q4:

The response is weird:

https://github.com/user-attachments/assets/5380f2b7-d137-423d-beaa-21d41e33d67f

https://github.com/user-attachments/assets/3d2785d9-5abe-42e2-8dae-0582c883d1c1

imtuyethan commented 1 month ago

Operating System: MacOS Sonoma 14.2 Processor: Apple M2 RAM: 16GB


Model: Tinyllama Chat 1.1B Q4

Seems like wrong prompt template?

Screenshot 2024-10-16 at 9 43 43 PM Screenshot 2024-10-16 at 9 43 53 PM

With the same prompt, Llama 3.2 1B Instruct Q8 gave me a correct/thorough answer.

imtuyethan commented 1 month ago

Operating System: MacOS Sonoma 14.2 Processor: Apple M2 RAM: 16GB

Gemma 1.1 2B Q4

Wrong prompt template?

Screenshot 2024-10-22 at 1 52 33 AM
imtuyethan commented 1 month ago

Operating System: MacOS Sonoma 14.2 Processor: Apple M2 RAM: 16GB

Llama 2 Chat 7B Q4

Wrong prompt template?

Screenshot 2024-10-22 at 2 32 32 PM
imtuyethan commented 1 month ago

Operating System: MacOS Sonoma 14.2 Processor: Apple M2 RAM: 16GB

CodeNinja 7B Q4

Wrong prompt template?

Screenshot 2024-10-22 at 2 34 32 PM
imtuyethan commented 1 month ago

Operating System: MacOS Sonoma 14.2 Processor: Apple M2 RAM: 16GB

LlaVa 7B

Weird responses:

Reported by user: https://zoom.us/clips/share/riUumJZ0uuzb5vQvZ2eZbMkmOq1nvU7O8VTD5FuBNtxRaO89rp9xA7CibJFCLlGju3nfyLsB_19iPegc0nSM4qxV.POPOcY7WXml_Ab8P

https://github.com/user-attachments/assets/e44274e7-725d-4927-aeb1-8cb03e6831b9

https://github.com/user-attachments/assets/33f2afa1-8f87-4356-bd6f-855a33124eb8

imtuyethan commented 1 month ago

Operating System: MacOS Sonoma 14.2 Processor: Apple M2 RAM: 16GB

Mistral 7B Instruct Q4

Missing stop word?

Screenshot 2024-10-22 at 6 03 59 PM
imtuyethan commented 1 month ago

Operating System: MacOS Sonoma 14.2 Processor: Apple M2 RAM: 16GB

Vistral 7B Q4

Missing stop word?

Screenshot 2024-10-22 at 6 07 18 PM
imtuyethan commented 1 month ago

Operating System: MacOS Sonoma 14.2 Processor: Apple M2 RAM: 16GB

Qwen 2 7B Instruct Q4

Weird format:

Screenshot 2024-10-22 at 6 08 58 PM
imtuyethan commented 1 month ago

Operating System: MacOS Sonoma 14.2 Processor: Apple M2 RAM: 16GB

BakLlava 1

Issue similar as LlaVa 7B

Screenshot 2024-10-22 at 7 11 57 PM
imtuyethan commented 1 month ago

Operating System: MacOS Sonoma 14.2 Processor: Apple M2 RAM: 16GB

Gemma 1.1 7B Q4

Wrong prompt template?

Screenshot 2024-10-22 at 7 13 58 PM
imtuyethan commented 1 month ago

Operating System: MacOS Sonoma 14.2 Processor: Apple M2 RAM: 16GB

LlaVa 13B Q4

Wrong prompt template?

Screenshot 2024-10-22 at 7 33 55 PM
imtuyethan commented 1 month ago

Operating System: MacOS Sonoma 14.2 Processor: Apple M2 RAM: 16GB

Wizard Coder Python 13B Q4

Wrong prompt template?

Screenshot 2024-10-22 at 7 35 40 PM
imtuyethan commented 1 month ago

114 (windows-dev-tensorRT-llm) OS: Windows 11 Pro (Version 23H2, build 22631.4037) CPU: AMD Ryzen Threadripper PRO 5955WX (16 cores) RAM: 32 GB GPU: NVIDIA GeForce RTX 3090 Storage: 599 GB local disk (C:)

Command-R v01 34B Q4

Pretty sure wrong prompt template:

Screenshot 2024-10-22 at 7 42 22 PM

dan-homebrew commented 1 month ago

@imtuyethan I recommending converting the Checklist you have above, into a table so we can track the status/fixing status.

Please work with @hahuyhoang411 - it may be that certain models are unsavable, and we should just remove them from the library.

imtuyethan commented 1 month ago

Device: windows-dev-tensorrt-llm Status: Running Node: 3x-3090s CPU: 1.26% of 16 RAM: 6.06/96 GiB Disk: 600 GiB

Mixtral 8x7B Instruct Q4

Screenshot 2024-10-22 at 11 08 56 PM

imtuyethan commented 1 month ago

Device: windows-dev-tensorrt-llm Status: Running Node: 3x-3090s CPU: 1.26% of 16 RAM: 6.06/96 GiB Disk: 600 GiB

Phind 34B Q4

Screenshot 2024-10-22 at 11 45 28 PM

imtuyethan commented 1 month ago

Device: windows-dev-tensorrt-llm Status: Running Node: 3x-3090s CPU: 1.26% of 16 RAM: 6.06/96 GiB Disk: 600 GiB

Llama 2 Chat 70B Q4

Screenshot 2024-10-22 at 11 53 20 PM

imtuyethan commented 1 month ago

Tasklist

I have QA-ed all models, please check ticket description for the latest update:

hahuyhoang411 commented 1 month ago

Current hub contains a lot of outdated models, and some new models have a prompt template bug. Here is my suggestion based on @imtuyethan QA-ed list:

The rationale for this delete list is model has been released >6months will be removed. Delete list:

Keep list:

imtuyethan commented 1 month ago

@hahuyhoang411 Should we add more new/trending models? The list seems short for a whole model hub.

Some edge cases we need to handle:

We can delete them from Hub, but they still show up on the users' side if they have downloaded these legacy models. How do we inform them when these models don't work?