modelscope / data-juicer

A one-stop data processing system to make data higher-quality, juicier, and more digestible for (multimodal) LLMs! 🍎 🍋 🌽 ➡️ ➡️🍸 🍹 🍷为大模型提供更高质量、更丰富、更易”消化“的数据!
Apache License 2.0
2.63k stars 166 forks source link

执行 python tools/process_data.py --config train.yaml 命令 #425

Closed abchbx closed 3 weeks ago

abchbx commented 3 weeks ago

Before Asking 在提问之前

Search before asking 先搜索,再提问

Question

卡在处理数据downloading半天不动

Additional 额外信息

image 在执行 python tools/process_data.py --config train.yaml 命令以后,进度一直卡在这里,有什么办法可以加速吗? 2024-09-10 15:45:40 | INFO | data_juicer.utils.model_utils:74 - Model [/root/.cache/data_juicer/models/lid.176.bin] not found. Downloading...

drcege commented 3 weeks ago

可以尝试从以下任一链接下载,放置到 /root/.cache/data_juicer/models/ 目录下: https://dail-wlcb.oss-cn-wulanchabu.aliyuncs.com/data_juicer/models/lid.176.bin https://dl.fbaipublicfiles.com/fasttext/supervised-models/lid.176.bin