modelscope / data-juicer

A one-stop data processing system to make data higher-quality, juicier, and more digestible for (multimodal) LLMs! 🍎 🍋 🌽 ➡️ ➡️🍸 🍹 🍷为大模型提供更高质量、更丰富、更易”消化“的数据!
Apache License 2.0
2.63k stars 166 forks source link

[Feat] Support `PythonCodesOperator` and `BashCodesOperator` that wraps an existing python file, or some code snippets to be executed, such as the existing DJ tools. #412

Closed yxdyc closed 1 week ago

yxdyc commented 1 month ago

Search before continuing 先搜索,再继续

Description 描述

Often, users require the integration of specific Data-Juicer tools, custom functionalities encapsulated within some helper_func.py, or some short Python scripts, such as a few lambda functions. These may not warrant the creation of a dedicated Operator due to the additional overhead involved, including subclassing, documentation, and unit testing.

To enhance flexibility and cater to diverse user needs, introducing some OPs that seamlessly incorporates existing Python files or executes code snippets would be beneficial. This approach enables users to enrich their data recipe configurations with a wider array of tools and custom code, which can be managed through a streamlined PythonCodesOperator and BashCodesOperator mechanism.

Use case 使用场景

No response

Additional 额外信息

No response

Are you willing to submit a PR for this feature? 您是否乐意为此功能提交一个 PR?

github-actions[bot] commented 1 week ago

This issue is marked as stale because there has been no activity for 21 days. Remove stale label or add new comments or this issue will be closed in 3 day.

github-actions[bot] commented 1 week ago

Close this stale issue.