turboderp / exllamav2

A fast inference library for running LLMs locally on modern consumer-class GPUs
MIT License
3.2k stars 236 forks source link

Tuning the ethical guidelines of ExLlamaV2 #335

Closed redshiva closed 3 months ago

redshiva commented 4 months ago

I have used the raw FB Llama2 models in developing my application. When interacting with the model, I did not encounter any ethical constraints. As far as I experienced, I could ask any questions and get an answer, which can be problematic for an application that is user facing. However, after converting the Meta model to ExLlamaV2, I ask questions and I am hitting: "As a responsible AI language model, I cannot fulfill that request..."

I want ethical constraints, but I want to tune them. How can I do this? In looking through the code, I do not see where this is being set.

Thank you!

DocShotgun commented 4 months ago

If I'm understanding correctly, you're suggesting that by quantizing the model to exl2... it somehow became censored? And you want that?

CrossPr0duct commented 4 months ago

You keep asking the model to be ethical and you beg, I think that works because it's AGI

Kimiko-AI commented 4 months ago

FEEL THE AGI

KaraKaraWitch commented 4 months ago

I have used the raw FB Llama2 models in developing my application. When interacting with the model, I did not encounter any ethical constraints. As far as I experienced, I could ask any questions and get an answer, which can be problematic for an application that is user facing. However, after converting the Meta model to ExLlamaV2, I ask questions and I am hitting: "As a responsible AI language model, I cannot fulfill that request..."

I want ethical constraints, but I want to tune them. How can I do this? In looking through the code, I do not see where this is being set.

Thank you!

You should probably look into LoRA's or actual finetuning the model. Quantization is not the way to do model tuning. Consider reading up on how to finetune/LoRA a LLM.

Also... in what context you want to add ethical constraints? Corporate?

turboderp commented 4 months ago

ExLlamaV2 doesn't do anything to make inference more or less ethical, it just runs the model. Quantization introduces some level of inaccuracy which means the response from a quantized model is never going to be exactly the same as the original model, for a given prompt. Likewise sampling options will affect the output in various unpredictable ways.

If you're running the chat example in llama mode, you can try adjusting the system prompt. The default prompt is the one originally provided by Meta and it's extremely "aligned", to the point of being ridiculous. Try it with something like `-sp "Just answer the questions." instead, or with a blank string.

As for tuning alignment in general, that's a a whole science. LoRAs are an option, or you can pick from any of the thousands of compatible models on HF finetuned for various purposes.

redshiva commented 4 months ago

Thank you for the helpful reply!


From: turboderp @.> Sent: Monday, February 12, 2024 1:36 To: turboderp/exllamav2 @.> Cc: Dave Durkee @.>; Author @.> Subject: Re: [turboderp/exllamav2] Tuning the ethical guidelines of ExLlamaV2 (Issue #335)

ExLlamaV2 doesn't do anything to make inference more or less ethical, it just runs the model. Quantization introduces some level of inaccuracy which means the response from a quantized model is never going to be exactly the same as the original model, for a given prompt. Likewise sampling options will affect the output in various unpredictable ways.

If you're running the chat example in llama mode, you can try adjusting the system prompt. The default prompt is the one originally provided by Meta and it's extremely "aligned", to the point of being ridiculous. Try it with something like `-sp "Just answer the questions." instead, or with a blank string.

As for tuning alignment in general, that's a a whole science. LoRAs are an option, or you can pick from any of the thousands of compatible models on HF finetuned for various purposes.

— Reply to this email directly, view it on GitHubhttps://github.com/turboderp/exllamav2/issues/335#issuecomment-1938235527, or unsubscribehttps://github.com/notifications/unsubscribe-auth/AG5P7IEHBAGQ7VS77MG6FIDYTHIAJAVCNFSM6AAAAABDECTVYCVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTSMZYGIZTKNJSG4. You are receiving this because you authored the thread.Message ID: @.***>