Closed Thireus closed 1 month ago
You should at least provide more information about the model and show some examples with its parameters
To help diagnose your issue, it would be helpful to know:
What hardware are you using? What model/quant are you using? What settings are you loading the model with? What prompt/sampler settings/frontend are you generating with? What exactly is wrong with the model outputs compared to 0.1.8?
I never noticed degradation but CR+ is broken now, maybe the problem isn't specific to it. Largestral, qwen and L3.1 appeared to work fine.
Sorry I was not able to provide prompt examples as they involved a complex and large set of instructions which I cannot disclose. The model had trouble understanding all the instructions and appeared to only focus on the last portion of the instructions (almost ignoring the first and mid portion of the prompt).
My observations were based on turboderp_Llama-3.1-70B-Instruct-exl2_6.0bpw
using --loader exllamav2_hf --max_seq_len 32768 --cache_4bit
.
It appears that v0.2.1 resolves the issue.
Using the same model and loading parameters I'm observing a severe degradation of the model's ability to understand my requests. Could it be related to tensor parallelism? I have to roll back to 0.1.8 in the meantime.