nlpxucan / WizardLM

LLMs build upon Evol Insturct: WizardLM, WizardCoder, WizardMath
9.11k stars 711 forks source link

Remove "As an AI language model..." from dataset. #4

Open generalsvr opened 1 year ago

generalsvr commented 1 year ago

The model very often writes "As an AI language model, I do not have opinions", even though the question was factual and did not ask for the model's personal opinion. I think there is too much that stuff in the dataset. Think about removing this when you start training other models.

teknium1 commented 1 year ago

Yep this model is imo SOTA, even compared to 30B llama fine tune of OA dataset, so long as it's retrained without the OAI boilerplate disclaimers. Very impressive model though!

teknium1 commented 1 year ago

Also @nlpxucan , do you have a twitter? Can we get in touch

victorsungo commented 1 year ago

@artyemk Thanks for your kindly suggestions. We are also focusing on improving the data quality now, and will update next version of WizardLM after significant improvement, your discovery and feedbacks are precious to us. Thank you again.

teknium1 commented 1 year ago

@artyemk Thanks for your kindly suggestions. We are also focusing on improving the data quality now, and will update next version of WizardLM after significant improvement, your discovery and feedbacks are precious to us. Thank you again.

Is it possible to release your prompt creation algorithm? Those of us with GPT4- access may be able to help build your dataset out further?

victorsungo commented 1 year ago

@artyemk Thanks for your kindly suggestions. We are also focusing on improving the data quality now, and will update next version of WizardLM after significant improvement, your discovery and feedbacks are precious to us. Thank you again.

Is it possible to release your prompt creation algorithm? Those of us with GPT4- access may be able to help build your dataset out further?

Thanks for reaching out that, we now use ChatGPT to get the answer of new instructions, GPT-4 is a better choice and would support more high-quality knowledge. We are focusing on improving the Evol-Instruct now and hope to relieve some existing weaknesses and issues in the next version of WizardLM. After that, we will open the code and pipeline of up-to-date Evol-Instruct algorithm and work with you together to improve it. Thank you very much again.

ehartford commented 1 year ago

here you go https://huggingface.co/datasets/ehartford/WizardLM_alpaca_evol_instruct_70k_unfiltered I am working on training a model with this dataset. It should be more cooperative.

teknium1 commented 1 year ago

here you go https://huggingface.co/datasets/ehartford/WizardLM_alpaca_evol_instruct_70k_unfiltered I am working on training a model with this dataset. It should be more cooperative.

Was this made by simply removing entries with alignment or rewriting them? I'm working on getting gpt3.5-turbo to rewrite the responses without the "ai assistant" or "ai language model" portions. I cornered about 5340~ entries with either of those phrases in the response fields.

teknium1 commented 1 year ago

After inspecting, it seems you removed around 20,000 entries. In my search for "aligned" responses as I said, I only found about 5300~. Could I ask your filtering method/way you found these entries, so I might find the entries I missed that need correcting?

ehartford commented 1 year ago

After inspecting, it seems you removed around 20,000 entries. In my search for "aligned" responses as I said, I only found about 5300~. Could I ask your filtering method/way you found these entries, so I might find the entries I missed that need correcting?

https://huggingface.co/datasets/ehartford/WizardLM_alpaca_evol_instruct_70k_unfiltered/blob/main/wizardlm_clean.py

teknium1 commented 1 year ago

here you go https://huggingface.co/datasets/ehartford/WizardLM_alpaca_evol_instruct_70k_unfiltered I am working on training a model with this dataset. It should be more cooperative.

Was this made by simply removing entries with alignment or rewriting them? I'm working on getting gpt3.5-turbo to rewrite the responses without the "ai assistant" or "ai language model" portions. I cornered about 5340~ entries with either of those phrases in the response fields.

Oh wow thank you for this info. I guess it is better to simply remove the entries then if covering those as my approach required the disclaimer sentence to be in the first sentence to rewrite

ehartford commented 1 year ago

When generating dataset, one could perhaps fall back to asking llama (65b preferably) when moralizing is detected in ChatGPT's response.

On Sat, Apr 29, 2023, 8:15 PM Teknium @.***> wrote:

here you go https://huggingface.co/datasets/ehartford/WizardLM_alpaca_evol_instruct_70k_unfiltered I am working on training a model with this dataset. It should be more cooperative.

Was this made by simply removing entries with alignment or rewriting them? I'm working on getting gpt3.5-turbo to rewrite the responses without the "ai assistant" or "ai language model" portions. I cornered about 5340~ entries with either of those phrases in the response fields.

Oh wow thank you for this info. I guess it is better to simply remove the entries then if covering those as my approach required the disclaimer sentence to be in the first sentence to rewrite

— Reply to this email directly, view it on GitHub https://github.com/nlpxucan/WizardLM/issues/4#issuecomment-1528928424, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAIQ4BIE3VUH7R6766HUIYDXDXKNFANCNFSM6AAAAAAXMBRP6E . You are receiving this because you commented.Message ID: @.***>

haiatn commented 10 months ago

Should we remove all "aligned" responses? I believe models do have limitations and sometimes it is best to let the user know about the the limitations of the model instead of returning an answer that the model shouldn't be answering

Alumniminium commented 10 months ago

Alignment should be in adapters like LoRas. Alignment should always be up to the enduser not to the base-model. I work with documents that contain alot of 'bad' words like gun, radiation, nuclear, explosive,... that have nothing to do with any malicious intent and all of the 'aligned' models just refuse to solve tasks from context like that or add their ramblings about morals and ethics to it.

ehartford commented 10 months ago

Alignment should be in adapters like LoRas. Alignment should always be up to the enduser not to the base-model. I work with documents that contain alot of 'bad' words like gun, radiation, nuclear, explosive,... that have nothing to do with any malicious intent and all of the 'aligned' models just refuse to solve tasks from context like that or add their ramblings about morals and ethics to it.

Preach it