Closed AronRynkiewicz closed 4 months ago
Ok, I managed to solve my problem. There were several mistakes:
run_sft.py
expects train and test split from dataset, so I modified the scriptBizarrely, problem was also with my run command, which hasn't reported any mistakes and job was running idle, nothing was being interrupted by exceptions. Run command was replaced by:
. /home/user/anaconda3/etc/profile.d/conda.sh
conda activate handbook
And now everything is running smoothly.
If anyone is interested, with such config and hardware, fine tuning took less than an hour (about 58 minutes
).
Hello,
I have been using the following configuration for SFT fine-tuning on a dataset of 60,000 entries, utilizing an 8xA100 80GB setup. Could someone please provide an estimate of the expected completion time for such a task? The main difference in config file is
num_train_epochs: 3
. Currently, it has been running for approximately three days.Additionally, I've observed low VRAM usage as reported by nvidia-smi.
As this is my first attempt at fine-tuning, I apologize if any of my questions seem naive. I've searched for information related to my queries but haven't found any relevant resources.
Thank you for your understanding and assistance.
Run command: