A one-stop data processing system to make data higher-quality, juicier, and more digestible for (multimodal) LLMs! 🍎 🍋 🌽 ➡️ ➡️🍸 🍹 🍷为大模型提供更高质量、更丰富、更易”消化“的数据!
Apache License 2.0
2.63k
stars
166
forks
source link
Automatically split input dataset in ray mode #415
Description
Split the dataset files into small pieces and process them in different batches to avoid exceeding the memory limit of Ray.