issues
search
huggingface
/
trl
Train transformer language models with reinforcement learning.
http://hf.co/docs/trl
Apache License 2.0
8.61k
stars
1.05k
forks
source link
issues
Newest
Newest
Most commented
Recently updated
Oldest
Least commented
Least recently updated
Fixed typo in SFT trainer docs
#1788
detsutut
opened
1 hour ago
0
[SFT] add model_init_kwargs to training_args
#1787
kashif
opened
17 hours ago
1
Lora seems to be invalid when using vsft_llava.py
#1786
shijian2001
opened
18 hours ago
3
Error with SFT of LLaVA-Next
#1785
GohioAC
opened
23 hours ago
0
Supports of SFTTrainer / PPOTrainer / DPOTrainer for LLaVA-alike model
#1784
fangkuoyu
opened
1 day ago
1
Clarification on reward/value heads in PPOV2
#1783
SalmanMohammadi
opened
1 day ago
3
Fix `start` index under `batched_forward_pass`
#1782
mertsayar8
opened
1 day ago
2
Conflict in start index under `batched_forward_pass`
#1781
mertsayar8
opened
1 day ago
0
[DOCS] fix docs and cli example script
#1780
kashif
closed
1 day ago
3
Nash md
#1779
kashif
opened
2 days ago
0
The DPO 'grad_norm': 0.0,
#1778
Faded1022
opened
2 days ago
0
Getting "KeyError: None" when passing conversational dataset
#1777
Kkordik
closed
2 days ago
1
fix model to save in ppov2
#1776
mnoukhov
opened
2 days ago
0
DDPO trained model error when used to generate images
#1775
nguyenhoa-uit
opened
3 days ago
1
Fix Documentation Overflow Issues for Long URLs in SFTConfig
#1774
Mubin17
closed
3 days ago
1
Remove the leading space in the tldr preference dataset
#1773
vwxyzjn
closed
3 days ago
1
Add SRPO algorithm.
#1772
frasermince
opened
3 days ago
1
`evaluation_strategy` to `eval_strategy`
#1771
qgallouedec
closed
3 days ago
1
Want to use zero3 to train KTO and met error
#1770
Faded1022
opened
4 days ago
0
[Code Improvement] Support concatnate forward in reward trainer
#1769
1485840691
opened
4 days ago
4
Can bert be used for dpo training?
#1768
anyuese
opened
5 days ago
0
SFTTrainer device error even though it doesn't take device as an argument
#1767
zyzhang1130
opened
5 days ago
0
Neftune is applied twice; in trl and transformers BOTH!
#1766
MilkClouds
opened
5 days ago
1
MoE Models: option to add load balancing loss
#1765
claralp
closed
4 days ago
5
Using IterableDataset crashed the SFTTrainer
#1764
helloworld1
opened
6 days ago
0
What is the difference between PPOv2Trainer and PPOTrainer?
#1763
mst272
closed
4 days ago
1
PPOv2 trainer, the wandb log is unnormal
#1762
kangqiyue
closed
4 days ago
8
SFTTrainer to add support for IterableDataset
#1761
helloworld1
opened
1 week ago
0
Add CPO-SimPO method
#1760
fe1ixxu
closed
5 days ago
1
Add ppov2 sentiment example (as a replacement to imdb example)
#1759
vwxyzjn
opened
1 week ago
1
Fix: Add dataset_text_field in examples/scripts/sft.py
#1758
scottsuk0306
closed
1 week ago
2
New sentiment and descriptiveness dataset
#1757
vwxyzjn
closed
1 week ago
2
DataCollatorForCompletionOnlyLM does not work with FSDP
#1756
aabhasgupta
opened
1 week ago
0
Misuse of gen_len in examples/notebooks/gpt2-sentiment.ipynb
#1755
MercuryDemo
opened
1 week ago
0
Issue #1751 Fix
#1754
yash-srivastava19
opened
1 week ago
5
change the `process` function in the example of DPO
#1753
AIR-hl
closed
1 week ago
4
Question about apply_chat_template in examples
#1752
EganGu
opened
1 week ago
7
[BUG] TRL CLI not capturing `torch_dtype` correctly
#1751
alvarobartt
opened
1 week ago
1
CI / `KTOTrainer`: Remove old tests
#1750
younesbelkada
closed
1 week ago
2
CPO / DPO: Fix red CI
#1749
younesbelkada
closed
1 week ago
2
`TrlParser`: Add ignore extra args option
#1748
younesbelkada
closed
1 week ago
1
CI / core: Pin `numpy` to `!=2.0.0` for CI and to users
#1747
younesbelkada
closed
1 week ago
1
DPODataCollatorWithPadding doesn't mask the prompts
#1746
A-Mahla
closed
1 week ago
0
[DOCS] Some headers (h2, h3) not rendered properly
#1745
alvarobartt
opened
1 week ago
0
Workflow: Notify tests results on slack channel
#1744
younesbelkada
closed
1 week ago
1
Support num_train_epochs
#1743
vwxyzjn
closed
1 week ago
3
Support for returning past_key_values from the model
#1742
idanshen
closed
1 week ago
3
TypeError: IterableDataset.map() got an unexpected keyword argument 'num_proc' with streaming datasets
#1741
mrbesher
opened
1 week ago
1
TypeError when not passing total_episodes in PPOv2Trainer
#1740
meng-wenlong
closed
1 week ago
4
better trl parser with yaml config
#1739
mnoukhov
closed
1 week ago
1
Next