modelscope / data-juicer

A one-stop data processing system to make data higher-quality, juicier, and more digestible for (multimodal) LLMs! 🍎 🍋 🌽 ➡️ ➡️🍸 🍹 🍷为大模型提供更高质量、更丰富、更易”消化“的数据!
Apache License 2.0
2.57k stars 162 forks source link

[Bug]: undefined symbol: _ZN3c104cuda9SetDeviceE #419

Open lh61500 opened 2 weeks ago

lh61500 commented 2 weeks ago

Before Reporting 报告之前

Search before reporting 先搜索,再报告

OS 系统

ubuntun 20

Installation Method 安装方式

source

Data-Juicer Version Data-Juicer版本

latest

Python Version Python版本

3.10

Describe the bug 描述这个bug

在python3.10 用pip install -v -e .[sci]安装的时候 没有问题 但是运行python tools/process_data.py --config configs/demo/process.yaml的时候出现以下报错: 卸载numpy2.0 改为1.26还是不行

A module that was compiled using NumPy 1.x cannot be run in NumPy 2.0.2 as it may crash. To support both 1.x and 2.x versions of NumPy, modules must be compiled with NumPy 2.0. Some module may need to rebuild instead e.g. with 'pybind11>=2.12'.

If you are a user of the module, the easiest solution will be to downgrade to 'numpy<2' or try to upgrade the affected module. We expect that some modules will need time to support NumPy 2.

Traceback (most recent call last): File "/home/likang/angang_data_clean/data-juicer-main/tools/process_data.py", line 3, in from data_juicer.config import init_configs File "/home/likang/angang_data_clean/data-juicer-main/data_juicer/config/init.py", line 1, in from .config import (export_config, get_init_configs, init_configs, File "/home/likang/angang_data_clean/data-juicer-main/data_juicer/config/config.py", line 17, in from data_juicer.ops.base_op import OPERATORS File "/home/likang/angang_data_clean/data-juicer-main/data_juicer/ops/init.py", line 1, in from . import deduplicator, filter, mapper, selector File "/home/likang/angang_data_clean/data-juicer-main/data_juicer/ops/deduplicator/init.py", line 1, in from . import (document_deduplicator, document_minhash_deduplicator, File "/home/likang/angang_data_clean/data-juicer-main/data_juicer/ops/deduplicator/document_deduplicator.py", line 14, in from ..base_op import OPERATORS, Deduplicator File "/home/likang/angang_data_clean/data-juicer-main/data_juicer/ops/base_op.py", line 5, in import pyarrow as pa File "/home/likang/miniconda3/envs/datajuicer/lib/python3.10/site-packages/pyarrow/init.py", line 65, in import pyarrow.lib as _lib AttributeError: _ARRAY_API not found Traceback (most recent call last): File "/home/likang/angang_data_clean/data-juicer-main/tools/process_data.py", line 3, in from data_juicer.config import init_configs File "/home/likang/angang_data_clean/data-juicer-main/data_juicer/config/init.py", line 1, in from .config import (export_config, get_init_configs, init_configs, File "/home/likang/angang_data_clean/data-juicer-main/data_juicer/config/config.py", line 17, in from data_juicer.ops.base_op import OPERATORS File "/home/likang/angang_data_clean/data-juicer-main/data_juicer/ops/init.py", line 1, in from . import deduplicator, filter, mapper, selector File "/home/likang/angang_data_clean/data-juicer-main/data_juicer/ops/deduplicator/init.py", line 1, in from . import (document_deduplicator, document_minhash_deduplicator, File "/home/likang/angang_data_clean/data-juicer-main/data_juicer/ops/deduplicator/document_deduplicator.py", line 14, in from ..base_op import OPERATORS, Deduplicator File "/home/likang/angang_data_clean/data-juicer-main/data_juicer/ops/base_op.py", line 5, in import pyarrow as pa File "/home/likang/miniconda3/envs/datajuicer/lib/python3.10/site-packages/pyarrow/init.py", line 65, in import pyarrow.lib as _lib File "pyarrow/lib.pyx", line 36, in init pyarrow.lib ImportError: numpy.core.multiarray failed to import

To Reproduce 如何复现

在python3.10 用pip install -v -e .[sci]安装的时候 没有问题 但是运行python tools/process_data.py --config configs/demo/process.yaml的时候出现以下报错: 卸载numpy2.0 改为1.26还是不行 错误变成ImportError: /home/likang/miniconda3/envs/datajuicer/lib/python3.10/site-packages/vllm/_C.cpython-310-x86_64-linux-gnu.so: undefined symbol: _ZN3c104cuda9SetDeviceEi

Configs 配置信息

cuda:print(torch.version) 2.4.0+cu121 A40 -8卡

Logs 报错日志

python tools/process_data.py --config configs/demo/process.yaml Traceback (most recent call last): File "/home/likang/angang_data_clean/data-juicer-main/tools/process_data.py", line 3, in from data_juicer.config import init_configs File "/home/likang/angang_data_clean/data-juicer-main/data_juicer/config/init.py", line 1, in from .config import (export_config, get_init_configs, init_configs, File "/home/likang/angang_data_clean/data-juicer-main/data_juicer/config/config.py", line 17, in from data_juicer.ops.base_op import OPERATORS File "/home/likang/angang_data_clean/data-juicer-main/data_juicer/ops/init.py", line 1, in from . import deduplicator, filter, mapper, selector File "/home/likang/angang_data_clean/data-juicer-main/data_juicer/ops/filter/init.py", line 2, in from . import (alphanumeric_filter, audio_duration_filter, File "/home/likang/angang_data_clean/data-juicer-main/data_juicer/ops/filter/video_tagging_from_frames_filter.py", line 8, in from ..mapper.video_tagging_from_frames_mapper import \ File "/home/likang/angang_data_clean/data-juicer-main/data_juicer/ops/mapper/init.py", line 2, in from . import (audio_ffmpeg_wrapped_mapper, chinese_convert_mapper, File "/home/likang/angang_data_clean/data-juicer-main/data_juicer/ops/mapper/extract_qa_mapper.py", line 16, in import vllm # noqa: F401 File "/home/likang/miniconda3/envs/datajuicer/lib/python3.10/site-packages/vllm/init.py", line 3, in from vllm.engine.arg_utils import AsyncEngineArgs, EngineArgs File "/home/likang/miniconda3/envs/datajuicer/lib/python3.10/site-packages/vllm/engine/arg_utils.py", line 6, in from vllm.config import (CacheConfig, ModelConfig, ParallelConfig, File "/home/likang/miniconda3/envs/datajuicer/lib/python3.10/site-packages/vllm/config.py", line 9, in from vllm.utils import get_cpu_memory, is_hip File "/home/likang/miniconda3/envs/datajuicer/lib/python3.10/site-packages/vllm/utils.py", line 8, in from vllm._C import cuda_utils ImportError: /home/likang/miniconda3/envs/datajuicer/lib/python3.10/site-packages/vllm/_C.cpython-310-x86_64-linux-gnu.so: undefined symbol: _ZN3c104cuda9SetDeviceEi

Screenshots 截图

No response

Additional 额外信息

No response

drcege commented 1 week ago

根据社区报告,该错误由 vllm 和 torch 版本不兼容导致 https://github.com/vllm-project/vllm/issues/1807