Open Mr-lonely0 opened 3 months ago
In EE-LLM and EE-Tuning paper, we use the jsonline format data provided by Data-Juicer. You can use the tools/preprocess_data.py
to preprocess the data into binary format as shown in README of Megatron-LM.
Marking as stale. No activity in 60 days.
Describe the solution you'd like Could you provide a script to preprocess data? Maybe a demo is enough.