A one-stop data processing system to make data higher-quality, juicier, and more digestible for (multimodal) LLMs! 🍎 🍋 🌽 ➡️ ➡️🍸 🍹 🍷为大模型提供更高质量、更丰富、更易”消化“的数据!
Change list(map()) to map() for filter OPs and keep origin coding for mapper OPs.
Make sure that dataset is a NestedDataset instance in run function.
NOTE: It does not make sure dataset to be NestedDataset instance when directly calling process function in Deduplicator and Selector OPs!
run
function. NOTE: It does not make sure dataset to be NestedDataset instance when directly callingprocess
function in Deduplicator and Selector OPs!