huggingface trl issues - Githubissues

huggingface / trl

Train transformer language models with reinforcement learning.

http://hf.co/docs/trl

Apache License 2.0

10.13k stars 1.28k forks source link

issues

Newest

Newest Most commented Recently updated Oldest Least commented Least recently updated

deprecate config in favor of args

#2384 qgallouedec opened 7 minutes ago
1
adding DRO trainer

#2383 morLev opened 1 hour ago
0
DPO does not work for FIM task with non-instruct model

#2382 AML14 opened 8 hours ago
1
⌛ Update log method to include `start_time` parameter

#2381 qgallouedec closed 21 hours ago
1
CLI refactor

#2380 qgallouedec opened 22 hours ago
1
📝 Fix typo in dataset generation script

#2379 jiseshen closed 22 hours ago
3
PPO Example Script Accelerator error: initialize your accelerator via `accelerator = Accelerator()`

#2377 hitzkrieg opened 1 day ago
0
ValueError: Predictions and/or references don't match the expected format.

#2376 scarafoni opened 1 day ago
0
AttributeError: 'DistributedDataParallel' object has no attribute 'policy' when saving model using PPOTrainer

#2375 AsiaLootus opened 2 days ago
0
A doubt in KTO _process_tokens

#2374 a7217339 closed 2 days ago
2
⏰ Add `start_time` to `_maybe_log_save_evaluate`

#2373 qgallouedec closed 2 days ago
1
[winrate callback] remove redundant call to eval and train

#2372 kashif closed 2 days ago
3
The DPO reward accuracy value is only 0 or 1

#2371 carrot0117 opened 2 days ago
1
🧲 Use our own `require_bitsandbytes`

#2370 qgallouedec closed 2 days ago
2
Fix dev install

#2369 lewtun closed 3 days ago
1
Can DPOTrainer support inputting encoded token IDs

#2368 LBJ6666 closed 3 days ago
2
Training a reward model with LoRA gets weird logs when using SEQ_CLS

#2367 gmonair closed 2 days ago
1
数据集支持多轮对话反馈吗

#2366 dosometingbyme closed 2 days ago
2
🎲 Move random judges in testing utilities

#2365 qgallouedec closed 4 days ago
1
📃 Fix description for parameter "generate_during_eval" in dpo_config

#2364 dakru012 closed 4 days ago
2
PPO manual reward functions

#2363 schmidtj3 opened 4 days ago
0
Still not supporting for ChatGLM3 maybe

#2362 fjy01 opened 4 days ago
0
Contributing new distillation related trainers

#2361 YihanCao123 opened 6 days ago
1
🗃️ Use specified `data_collator` in `RLOOTrainer` and `PPOTrainer`

#2360 bartoszzuk closed 4 days ago
3
🧪 [Experimental] Train LeRobot policy with TRL

#2359 qgallouedec opened 1 week ago
2
Question about the logprobs of the policy-generated sentences in PPO trainer

#2358 yanghh2000 opened 1 week ago
0
PPOTrainer with HuggingFace PreTrainedModelWrapper Models

#2357 Mrinh212375 opened 1 week ago
3
How to train from scratch? Can you provide the code

#2356 sankexin opened 1 week ago
2
Dpo Train Issue: max step from 1000 to 996349

#2355 seTalent opened 1 week ago
1
[Bug] Use specified data_collator instead of hard-coding the option?

#2354 yifeim closed 4 days ago
0
BUG in the new PPO trainer

#2353 TingchenFu opened 1 week ago
2
KTO: `unpair_preference_dataset` does not work for datasets with additional columns

#2351 LuisVasquezBSC opened 1 week ago
0
⚠️ Add warning guidelines and update codebase to follow best practices

#2350 qgallouedec opened 1 week ago
1
👋 Remove deprecated `tokenizer` argument in BCO, GKD, Iterative SFT, Nash MD and XPO

#2349 qgallouedec closed 1 week ago
1
👈 Add `tokenizer` arg back and add deprecation guidelines

#2348 qgallouedec closed 1 week ago
2
⚖️ Add `use_soft_judge` option to `WinRateCallback`

#2347 kashif closed 1 week ago
1
[Question] `add_generation_prompt=True` on prompt

#2346 Galaxy-Husky closed 1 week ago
5
🔮 Inference mode in `GeometricMixtureWrapper.forward`

#2345 kashif closed 4 days ago
3
📉 Add PEFT support for `PPOTrainer`

#2344 ccs96307 closed 4 days ago
3
💣 Remove transformers version check

#2343 xyangk closed 1 week ago
1
RLOO Checkpoint Issue

#2342 asparius closed 3 days ago
3
[Question] Why is Importance Sampling and Clipping applied in RLOO?

#2341 shashankg7 opened 1 week ago
3
Multiple Errors with PPOTrainer. error in ppo_trainer.dataloader

#2340 Debolena7 opened 1 week ago
9
Difference between SFTTrainer and Seq2seqTrainer

#2339 Hyfred opened 1 week ago
0
RuntimeError: chunk expects at least a 1-dimensional tensor

#2338 imrankh46 opened 2 weeks ago
11
DPO Training DataLoader is not shuffled

#2337 kaiwenw opened 2 weeks ago
1
Adding video llm fine-tuning example

#2336 mfarre closed 1 week ago
1
Accelerator package version problem

#2335 littleshutong opened 2 weeks ago
2
Set gradient_checkpointing_kwargs in the yaml

#2334 Galaxy-Husky opened 2 weeks ago
1
Bump liger-kernel to 0.4.0

#2333 ByronHsu closed 2 weeks ago
5