I run the train-single-node.sh in the scripts to SFT the LLaMA-3-8B base model on the provided dart-math-hard and dart-math-uniform data, with only one modification: change the batch_size from 64 to 32 (due to the CUDA memory), the test accuracy of these 2 SFT models on GSM8k are 0.4185 (hard) and 0.4617 (uniform).
The primary packages of my environment are:
torch 2.0.1 transformers 4.42
I run the train-single-node.sh in the scripts to SFT the LLaMA-3-8B base model on the provided dart-math-hard and dart-math-uniform data, with only one modification: change the batch_size from 64 to 32 (due to the CUDA memory), the test accuracy of these 2 SFT models on GSM8k are 0.4185 (hard) and 0.4617 (uniform).
The primary packages of my environment are:
torch 2.0.1 transformers 4.42