issues
search
huggingface
/
trl
Train transformer language models with reinforcement learning.
http://hf.co/docs/trl
Apache License 2.0
10.13k
stars
1.28k
forks
source link
issues
Newest
Newest
Most commented
Recently updated
Oldest
Least commented
Least recently updated
deprecate config in favor of args
#2384
qgallouedec
opened
7 minutes ago
1
adding DRO trainer
#2383
morLev
opened
1 hour ago
0
DPO does not work for FIM task with non-instruct model
#2382
AML14
opened
8 hours ago
1
โ Update log method to include `start_time` parameter
#2381
qgallouedec
closed
21 hours ago
1
CLI refactor
#2380
qgallouedec
opened
22 hours ago
1
๐ Fix typo in dataset generation script
#2379
jiseshen
closed
22 hours ago
3
PPO Example Script Accelerator error: initialize your accelerator via `accelerator = Accelerator()`
#2377
hitzkrieg
opened
1 day ago
0
ValueError: Predictions and/or references don't match the expected format.
#2376
scarafoni
opened
1 day ago
0
AttributeError: 'DistributedDataParallel' object has no attribute 'policy' when saving model using PPOTrainer
#2375
AsiaLootus
opened
2 days ago
0
A doubt in KTO _process_tokens
#2374
a7217339
closed
2 days ago
2
โฐ Add `start_time` to `_maybe_log_save_evaluate`
#2373
qgallouedec
closed
2 days ago
1
[winrate callback] remove redundant call to eval and train
#2372
kashif
closed
2 days ago
3
The DPO reward accuracy value is only 0 or 1
#2371
carrot0117
opened
2 days ago
1
๐งฒ Use our own `require_bitsandbytes`
#2370
qgallouedec
closed
2 days ago
2
Fix dev install
#2369
lewtun
closed
3 days ago
1
Can DPOTrainer support inputting encoded token IDs
#2368
LBJ6666
closed
3 days ago
2
Training a reward model with LoRA gets weird logs when using SEQ_CLS
#2367
gmonair
closed
2 days ago
1
ๆฐๆฎ้ๆฏๆๅค่ฝฎๅฏน่ฏๅ้ฆๅ
#2366
dosometingbyme
closed
2 days ago
2
๐ฒ Move random judges in testing utilities
#2365
qgallouedec
closed
4 days ago
1
๐ Fix description for parameter "generate_during_eval" in dpo_config
#2364
dakru012
closed
4 days ago
2
PPO manual reward functions
#2363
schmidtj3
opened
4 days ago
0
Still not supporting for ChatGLM3 maybe
#2362
fjy01
opened
4 days ago
0
Contributing new distillation related trainers
#2361
YihanCao123
opened
6 days ago
1
๐๏ธ Use specified `data_collator` in `RLOOTrainer` and `PPOTrainer`
#2360
bartoszzuk
closed
4 days ago
3
๐งช [Experimental] Train LeRobot policy with TRL
#2359
qgallouedec
opened
1 week ago
2
Question about the logprobs of the policy-generated sentences in PPO trainer
#2358
yanghh2000
opened
1 week ago
0
PPOTrainer with HuggingFace PreTrainedModelWrapper Models
#2357
Mrinh212375
opened
1 week ago
3
How to train from scratch? Can you provide the code
#2356
sankexin
opened
1 week ago
2
Dpo Train Issue: max step from 1000 to 996349
#2355
seTalent
opened
1 week ago
1
[Bug] Use specified data_collator instead of hard-coding the option?
#2354
yifeim
closed
4 days ago
0
BUG in the new PPO trainer
#2353
TingchenFu
opened
1 week ago
2
KTO: `unpair_preference_dataset` does not work for datasets with additional columns
#2351
LuisVasquezBSC
opened
1 week ago
0
โ ๏ธ Add warning guidelines and update codebase to follow best practices
#2350
qgallouedec
opened
1 week ago
1
๐ Remove deprecated `tokenizer` argument in BCO, GKD, Iterative SFT, Nash MD and XPO
#2349
qgallouedec
closed
1 week ago
1
๐ Add `tokenizer` arg back and add deprecation guidelines
#2348
qgallouedec
closed
1 week ago
2
โ๏ธ Add `use_soft_judge` option to `WinRateCallback`
#2347
kashif
closed
1 week ago
1
[Question] `add_generation_prompt=True` on prompt
#2346
Galaxy-Husky
closed
1 week ago
5
๐ฎ Inference mode in `GeometricMixtureWrapper.forward`
#2345
kashif
closed
4 days ago
3
๐ Add PEFT support for `PPOTrainer`
#2344
ccs96307
closed
4 days ago
3
๐ฃ Remove transformers version check
#2343
xyangk
closed
1 week ago
1
RLOO Checkpoint Issue
#2342
asparius
closed
3 days ago
3
[Question] Why is Importance Sampling and Clipping applied in RLOO?
#2341
shashankg7
opened
1 week ago
3
Multiple Errors with PPOTrainer. error in ppo_trainer.dataloader
#2340
Debolena7
opened
1 week ago
9
Difference between SFTTrainer and Seq2seqTrainer
#2339
Hyfred
opened
1 week ago
0
RuntimeError: chunk expects at least a 1-dimensional tensor
#2338
imrankh46
opened
2 weeks ago
11
DPO Training DataLoader is not shuffled
#2337
kaiwenw
opened
2 weeks ago
1
Adding video llm fine-tuning example
#2336
mfarre
closed
1 week ago
1
Accelerator package version problem
#2335
littleshutong
opened
2 weeks ago
2
Set gradient_checkpointing_kwargs in the yaml
#2334
Galaxy-Husky
opened
2 weeks ago
1
Bump liger-kernel to 0.4.0
#2333
ByronHsu
closed
2 weeks ago
5
Next