issues
search
huggingface
/
alignment-handbook
Robust recipes to align language models with human and AI preferences
https://huggingface.co/HuggingFaceH4
Apache License 2.0
4.18k
stars
354
forks
source link
issues
Newest
Newest
Most commented
Recently updated
Oldest
Least commented
Least recently updated
Minor question about PAD token and EOS token.
#127
HaniItani
opened
4 months ago
2
Estimated Time for SFT Fine-Tuning of Mistral-7B Model
#126
AronRynkiewicz
closed
4 months ago
1
Major bug: Chat template is not actually applied in run_sft.py and run_dpo.py
#125
AlexiaJM
opened
4 months ago
7
cannot replicate DPO results of zephyr
#124
AlexiaJM
opened
4 months ago
5
Add `auto_insert_empty_system_msg` config flag
#123
BramVanroy
closed
4 months ago
3
Zephyr-dpo-full Checkpoints perform poorly on TruthfulQA.
#122
xijiu9
opened
4 months ago
1
DPO recipe saves a float32 model
#121
tcapelle
opened
4 months ago
0
(QLoRA) DPO without previous SFT
#120
DavidFarago
opened
4 months ago
1
ImportError: Flash Attention 2 is not available
#119
BakingBrains
closed
4 months ago
0
Cost of Generating a Dataset for Constitutional AI
#118
Ashish-Soni08
opened
5 months ago
0
system message being included in chosen & rejected when chat_template inserts system message
#117
dctanner
closed
4 months ago
2
About DPO formatting before fine-tuning
#116
alvarobartt
closed
3 months ago
4
Apply quantization during DPO QLoRA
#115
lewtun
closed
5 months ago
1
Using MT-Bench to evaluate zephyr
#114
abgoswam
opened
5 months ago
2
Update README.md
#113
eltociear
closed
5 months ago
0
Blog post url: "constitutional-ai" -> "constitutional_ai"
#112
kgourgou
closed
5 months ago
1
Update README.md
#111
lewtun
closed
5 months ago
1
DPO loss on different datasets
#110
wj210
opened
5 months ago
0
Reward Modeling Support
#109
agi-piggy
opened
5 months ago
0
Constitutional AI recipe
#108
vwxyzjn
closed
5 months ago
1
Make `bnb_4bit_compute_dtype` consistent with `torch_dtype`
#107
nathan-az
closed
5 months ago
1
Add check before inserting system message
#106
nathan-az
closed
5 months ago
3
Cannot apply "run_dpo.py" on a trained Axolotl model
#105
MatanVetzler
opened
5 months ago
0
DPO/IPO/KTO ablations
#104
edbeeching
closed
5 months ago
1
Does QLora DPO Training support reference model?
#103
Harry-mic
opened
5 months ago
0
Feat: Add loftq option
#102
hahuyhoang411
closed
4 months ago
0
`apply_chat_template` not compatible with tokenizers that do not support a system prompt
#101
nathan-az
closed
5 months ago
3
Make docs work
#100
lewtun
opened
5 months ago
1
Output from zephyr-7b-dpo-qlora is weird
#99
ChenDRAG
opened
5 months ago
0
Is QLoRA better than finetuning?
#98
normster
opened
5 months ago
0
Fixes #96 by handling RepositoryNotFoundError
#97
tleyden
closed
5 months ago
8
RepositoryNotFoundError when running scripts/run_dpo.py with local repo
#96
tleyden
closed
5 months ago
0
Bump lower version of huggingface_hub
#95
lewtun
closed
5 months ago
0
huggingface_hub version
#94
Harry-mic
closed
5 months ago
1
jinja2.exceptions.TemplateError: Conversation roles must alternate user/assistant/user/assistant/...
#93
Feynman27
opened
5 months ago
2
Is there anyway that I can use learning rate warm-up during the training ?
#92
shamanez
closed
5 months ago
1
how to use dpo without flash-attention
#91
Fu-Dayuan
opened
6 months ago
1
Question about AI Feedback (AIF)
#90
HaoruSung
opened
6 months ago
0
Clean deprecated max samples arguments
#89
kirill-fedyanin
closed
6 months ago
0
Update Zephyr configs to account for UltraFeedback & TRL fixes
#88
lewtun
closed
5 months ago
0
How can I config `loss_type`?
#87
hahuyhoang411
closed
5 months ago
2
Make SFT script consistent with DPO script
#86
NielsRogge
closed
6 months ago
1
About Flash Attn's version
#85
chengjl19
opened
6 months ago
1
Chat template is being overwritten when the Tokenizer has a `default_chat_template`.
#84
nathan-az
closed
6 months ago
1
Check that `default_chat_template` is also None
#83
nathan-az
closed
6 months ago
1
Enhanced Uncertainty Vectors: A Novel Approach for AI Alignment
#82
binoculars
opened
6 months ago
1
Why we use a lower batch size when comparing SFT lora with SFT full fine-tuning ?
#81
shamanez
closed
6 months ago
2
Make max samples work again
#80
kirill-fedyanin
closed
6 months ago
1
Finetuned `zephyr-7b-beta` with internal data generates same reuslts as model `HuggingFaceH4/zephyr-7b-beta`
#79
wxp16
opened
6 months ago
1
wierd conversation with zephyr-7b-dpo-lora
#78
njupopsicle
opened
6 months ago
2
Previous
Next