问题 flashattention - Githubissues

shibing624 / MedicalGPT

MedicalGPT: Training Your Own Medical GPT Model with ChatGPT Training Pipeline. 训练医疗大模型，实现了包括增量预训练(PT)、有监督微调(SFT)、RLHF、DPO、ORPO。

Apache License 2.0

2.94k stars 452 forks source link

Closed wuguangshuo closed 5 months ago

wuguangshuo commented 5 months ago

请问开启flashattention后，在sft可以节约多少显存和减少多少训练时间呢，我在4090上训练llama2 好像没有明显的变化

shibing624 commented 5 months ago

用的transformers的flash_attn,加速问题可以在官方transformers提issue。