huggingface alignment-handbook issues

huggingface / alignment-handbook

Robust recipes to align language models with human and AI preferences

https://huggingface.co/HuggingFaceH4

Apache License 2.0

4.18k stars 354 forks source link

issues

Newest

Newest Most commented Recently updated Oldest Least commented Least recently updated

Minor question about PAD token and EOS token.

#127 HaniItani opened 4 months ago
2
Estimated Time for SFT Fine-Tuning of Mistral-7B Model

#126 AronRynkiewicz closed 4 months ago
1
Major bug: Chat template is not actually applied in run_sft.py and run_dpo.py

#125 AlexiaJM opened 4 months ago
7
cannot replicate DPO results of zephyr

#124 AlexiaJM opened 4 months ago
5
Add `auto_insert_empty_system_msg` config flag

#123 BramVanroy closed 4 months ago
3
Zephyr-dpo-full Checkpoints perform poorly on TruthfulQA.

#122 xijiu9 opened 4 months ago
1
DPO recipe saves a float32 model

#121 tcapelle opened 4 months ago
0
(QLoRA) DPO without previous SFT

#120 DavidFarago opened 4 months ago
1
ImportError: Flash Attention 2 is not available

#119 BakingBrains closed 4 months ago
0
Cost of Generating a Dataset for Constitutional AI

#118 Ashish-Soni08 opened 5 months ago
0
system message being included in chosen & rejected when chat_template inserts system message

#117 dctanner closed 4 months ago
2
About DPO formatting before fine-tuning

#116 alvarobartt closed 3 months ago
4
Apply quantization during DPO QLoRA

#115 lewtun closed 5 months ago
1
Using MT-Bench to evaluate zephyr

#114 abgoswam opened 5 months ago
2
Update README.md

#113 eltociear closed 5 months ago
0
Blog post url: "constitutional-ai" -> "constitutional_ai"

#112 kgourgou closed 5 months ago
1
Update README.md

#111 lewtun closed 5 months ago
1
DPO loss on different datasets

#110 wj210 opened 5 months ago
0
Reward Modeling Support

#109 agi-piggy opened 5 months ago
0
Constitutional AI recipe

#108 vwxyzjn closed 5 months ago
1
Make `bnb_4bit_compute_dtype` consistent with `torch_dtype`

#107 nathan-az closed 5 months ago
1
Add check before inserting system message

#106 nathan-az closed 5 months ago
3
Cannot apply "run_dpo.py" on a trained Axolotl model

#105 MatanVetzler opened 5 months ago
0
DPO/IPO/KTO ablations

#104 edbeeching closed 5 months ago
1
Does QLora DPO Training support reference model?

#103 Harry-mic opened 5 months ago
0
Feat: Add loftq option

#102 hahuyhoang411 closed 4 months ago
0
`apply_chat_template` not compatible with tokenizers that do not support a system prompt

#101 nathan-az closed 5 months ago
3
Make docs work

#100 lewtun opened 5 months ago
1
Output from zephyr-7b-dpo-qlora is weird

#99 ChenDRAG opened 5 months ago
0
Is QLoRA better than finetuning?

#98 normster opened 5 months ago
0
Fixes #96 by handling RepositoryNotFoundError

#97 tleyden closed 5 months ago
8
RepositoryNotFoundError when running scripts/run_dpo.py with local repo

#96 tleyden closed 5 months ago
0
Bump lower version of huggingface_hub

#95 lewtun closed 5 months ago
0
huggingface_hub version

#94 Harry-mic closed 5 months ago
1
jinja2.exceptions.TemplateError: Conversation roles must alternate user/assistant/user/assistant/...

#93 Feynman27 opened 5 months ago
2
Is there anyway that I can use learning rate warm-up during the training ?

#92 shamanez closed 5 months ago
1
how to use dpo without flash-attention

#91 Fu-Dayuan opened 6 months ago
1
Question about AI Feedback (AIF)

#90 HaoruSung opened 6 months ago
0
Clean deprecated max samples arguments

#89 kirill-fedyanin closed 6 months ago
0
Update Zephyr configs to account for UltraFeedback & TRL fixes

#88 lewtun closed 5 months ago
0
How can I config `loss_type`?

#87 hahuyhoang411 closed 5 months ago
2
Make SFT script consistent with DPO script

#86 NielsRogge closed 6 months ago
1
About Flash Attn's version

#85 chengjl19 opened 6 months ago
1
Chat template is being overwritten when the Tokenizer has a `default_chat_template`.

#84 nathan-az closed 6 months ago
1
Check that `default_chat_template` is also None

#83 nathan-az closed 6 months ago
1
Enhanced Uncertainty Vectors: A Novel Approach for AI Alignment

#82 binoculars opened 6 months ago
1
Why we use a lower batch size when comparing SFT lora with SFT full fine-tuning ?

#81 shamanez closed 6 months ago
2
Make max samples work again

#80 kirill-fedyanin closed 6 months ago
1
Finetuned `zephyr-7b-beta` with internal data generates same reuslts as model `HuggingFaceH4/zephyr-7b-beta`

#79 wxp16 opened 6 months ago
1
wierd conversation with zephyr-7b-dpo-lora

#78 njupopsicle opened 6 months ago
2

Previous Next