A one-stop data processing system to make data higher-quality, juicier, and more digestible for (multimodal) LLMs! 🍎 🍋 🌽 ➡️ ➡️🍸 🍹 🍷为大模型提供更高质量、更丰富、更易”消化“的数据!
Apache License 2.0
2.63k
stars
166
forks
source link
Guidance for OP with multiple data fields to be processed #411
[X] I have searched the Data-Juicer issues and found no similar feature requests. 我已经搜索了 Data-Juicer 的 issue 列表但是没有发现类似的功能需求。
Description 描述
Currently, users may be confused about supporting multiple fields for a given OP. For example, developing a OP that processes both text_key="question" and text_key="answer".
Besides, we need to add some guidance about the type of text related keys, e.g., must be str, rather than a list or dict, for the sake of efficiency and coding convenience (implicit assumptions for all text-related OPs).
This issue is marked as stale because there has been no activity for 21 days. Remove stale label or add new comments or this issue will be closed in 3 day.
Search before continuing 先搜索,再继续
Description 描述
Currently, users may be confused about supporting multiple fields for a given OP. For example, developing a OP that processes both text_key="question" and text_key="answer".
Besides, we need to add some guidance about the type of text related keys, e.g., must be
str
, rather than alist
ordict
, for the sake of efficiency and coding convenience (implicit assumptions for all text-related OPs).Use case 使用场景
related issue: https://github.com/modelscope/data-juicer/issues/380
Additional 额外信息
No response
Are you willing to submit a PR for this feature? 您是否乐意为此功能提交一个 PR?