pan-x-c / EE-LLM

EE-LLM is a framework for large-scale training and inference of early-exit (EE) large language models (LLMs).
Other
44 stars 4 forks source link

[ENHANCEMENT] Could you provide a script to preprocess data? #14

Open Mr-lonely0 opened 3 months ago

Mr-lonely0 commented 3 months ago

Describe the solution you'd like Could you provide a script to preprocess data? Maybe a demo is enough.

pan-x-c commented 3 months ago

In EE-LLM and EE-Tuning paper, we use the jsonline format data provided by Data-Juicer. You can use the tools/preprocess_data.py to preprocess the data into binary format as shown in README of Megatron-LM.

github-actions[bot] commented 1 month ago

Marking as stale. No activity in 60 days.