issues
search
huggingface
/
trl
Train transformer language models with reinforcement learning.
http://hf.co/docs/trl
Apache License 2.0
12.51k
stars
1.69k
forks
source link
issues
Newest
Newest
Most commented
Recently updated
Oldest
Least commented
Least recently updated
Trainer Adder/Combiner
#3092
AMindToThink
opened
21 hours ago
2
⛔ Add EOS token to processed input in SFT
#3091
qgallouedec
closed
21 hours ago
4
Another algorithm to compute the advantages in GRPO Trainer
#3090
afghl
opened
1 day ago
1
GRPO Trainer + Peft + Gradient checkpointing doesn't work
#3089
kimihailvfg
opened
1 day ago
1
[BUG] The map device of training model in GRPO includes the device used by vllm
#3088
maoulee
opened
1 day ago
0
Memory management issue in PyTorch when calling PPO
#3087
denis20131
opened
1 day ago
0
Training model in GRPO Trainer will use the vllm_device to compute logps, which causes sudden vram increases of vllm device and caused OOM error.
#3086
maoulee
closed
1 day ago
0
How can I specify a gpu id for vllm
#3085
YueChenkkk
closed
1 day ago
2
Why the ratio in PPOTrainer use the same policy model rather than old and new policy model to generate?
#3084
Lynnzake
opened
1 day ago
0
Why SFTTrainer process instruction data without EOS?
#3083
DiaQusNet
opened
1 day ago
0
add cli dict parsing for grpo_config
#3082
Tavish9
opened
1 day ago
7
Enabling `GRPOConfig.include_tokens_per_second` crashes GRPO training
#3081
wizeng23
opened
1 day ago
0
DPO Trainer error "full() received an invalid combination of arguments"
#3080
sivaganesh07
opened
1 day ago
2
Flexible_reward
#3079
shirinyamani
opened
2 days ago
6
GRPO Trainer/Config Incomplete metrics logging to wandb
#3078
SpaceHunterInf
closed
1 day ago
5
What does this line in the ppo trainer do ?
#3077
LinHungShi
opened
2 days ago
0
🕊️ Padding-free for SFT
#3076
qgallouedec
closed
3 hours ago
1
🫣 [GRPO] add cache_implementation option in GRPO
#3075
kashif
closed
2 days ago
1
🎭 Minor spelling fix in documentation (caracteres -> characters)
#3074
esnible
closed
2 days ago
0
[GRPO] add vlm training capabilities to the trainer
#3072
CompN3rd
opened
2 days ago
4
AutoModelForCausalLMWithValueHead initialize the model with deepspeed zero3
#3071
JYX1216
opened
2 days ago
0
💎 Gemma 3 SFT example on Codeforces dataset
#3070
qgallouedec
closed
2 days ago
3
Fix: Multi gpu hang for ORPO and CPO Trainer
#3069
NanoCode012
opened
2 days ago
2
[Bug] ORPO Trainer hangs with multi-gpu on step 0
#3068
NanoCode012
opened
2 days ago
2
🚀 Question: Is attention_mask still lower triangular when using packing=True in SFTTrainer?
#3067
noforit
opened
2 days ago
2
How to switch on the multi-GPU for GRPOTrainer?
#3066
tjoymeed
opened
2 days ago
5
[WIP] PEFT 🤝 Liger DPO
#3065
SalmanMohammadi
opened
3 days ago
0
Enable External Launcher Support for vLLM in TRL for Efficient GRPO Training
#3064
mtoslalibu
opened
3 days ago
0
Online DPO crashes when using multiple GPUs
#3063
wilrop
opened
3 days ago
0
[GRPO] use argument names with processing_class
#3062
kashif
closed
3 days ago
1
【Question】Why GRPOTrainer requires (per_device_train_batch_size * n_processes) % n_generations == 0
#3061
YueChenkkk
opened
3 days ago
2
Fractional logging_steps causes completions not to be logged after step 0.
#3060
VanderpoelLiam
opened
3 days ago
0
The code in examples/notebooks is not compatible with the latest version of TRL. When will it be updated?
#3059
Jack-ctrl6
opened
3 days ago
0
Fixed the GKD example and also moved the student generation inside se…
#3058
benyaminjami
closed
3 days ago
1
【question】Why not use `n` for every sample for vllm in GRPOTrianer ?
#3057
vagitablebirdcode
closed
3 days ago
6
🦥 Fixed `SFTTrainer.compute_loss` hang from #3048's PR comments
#3056
jamesbraza
closed
3 days ago
1
Efficient Knowledge Distillation: Storing Only Top-N Teacher Logits for Reduced Memory Usage
#3055
mertege
opened
4 days ago
0
Why GRPOConfig has some of vLLM parameters instead of using kwargs?
#3054
mtoslalibu
opened
4 days ago
3
🏊 [SFT] Compatibility with padding free and iterable dataset
#3053
qgallouedec
closed
3 days ago
2
👯 [GRPO] Relax the assumption that prompts are unique within a batch
#3052
qgallouedec
closed
4 days ago
1
Are the training and inference/generation phases blocking?
#3050
yash-malik
closed
1 day ago
2
GRPO, PPO, DPO Trainer for VLM
#3051
SabaPivot
closed
2 days ago
3
`GRPOTrainer` can crash with `AttributeError` for `Callable` `reward_func.__name__`
#3049
jamesbraza
opened
5 days ago
1
💠 Fixing `SFTTrainer.compute_loss` crash with `accelerate`
#3048
jamesbraza
closed
4 days ago
2
`sft_trainer` incompatible with `accelerator.gather_for_metrics`
#3047
jamesbraza
closed
4 days ago
0
🏁 Passing custom BOS/EOS token to `GPROTrainer.generation_config`
#3046
jamesbraza
closed
4 days ago
1
GRPO + vLLM generation shape error under duplicate prompts
#3045
tchang1997
closed
4 days ago
1
How much memory needed for qlora for grpo use vllm?The cuda memory suddenly increases at first step and caused oom
#3044
maoulee
closed
1 day ago
4
Fixing JSD loss computation in GKDTrainer as per definition
#3043
abhigoyal1997
closed
2 days ago
5
Online DPO docs training example fails when `max_new_tokens` > 512 with error "The size of tensor a (0) must match the size of tensor b (688) at non-singleton dimension 1"
#3042
skoshx
opened
5 days ago
1
Next