0%| | 0/112 [00:00<?, ?it/s]
You are not running the flash-attention implementation, expect numerical differences.
~/miniconda3/envs/llama-factory/lib/python3.10/site-packages/torch/utils/checkpoint.py:91: UserWarning: None of the inputs have requires_grad=True. Gradients will be None
warnings.warn(
Segmentation fault (core dumped)
Reminder
Reproduction
CUDA_VISIBLE_DEVICES=4,5,6,7 llamafactory-cli train phi-3-mini-128k-dpo-0518.yaml
Expected behavior
No response
System Info
Copy-and-paste the text below in your GitHub issue and FILL OUT the two last points.
transformers
version: 4.41.0Others