hiyouga / LLaMA-Factory

Unified Efficient Fine-Tuning of 100+ LLMs (ACL 2024)
https://arxiv.org/abs/2403.13372
Apache License 2.0
32.83k stars 4.03k forks source link

华为NPU适配,依赖冲突。 #5763

Open yangyang6666 opened 1 day ago

yangyang6666 commented 1 day ago

Reminder

System Info

ERROR: Cannot install llamafactory and llamafactory[metrics,torch-npu]==0.9.1.dev0 because these package versions have conflicting dependencies.

The conflict is caused by: llamafactory[metrics,torch-npu] 0.9.1.dev0 depends on torch==2.1.0; extra == "torch-npu" torch-npu 2.1.0.post3 depends on torch==2.1.0+cpu

Reproduction

pip install -e ".[torch-npu,metrics]"

Expected behavior

No response

Others

No response

ToruKiyono commented 1 day ago

This is right! However, you can avoid the conflict.

pip install -e ".[metrics]"

You could remove torch when installing llamafactory, then, after installed successfully,next to install torch_npu.

hiyouga commented 8 hours ago

try install them independently and run pip install --no-deps .

yangyang6666 commented 2 hours ago

I change setup.py "torch-npu": [..., "torch-npu==2.1.0.post3", ...], to "torch-npu": [..., "torch-npu==2.1.0", ...], then I got a new problem:

ERROR: pip's dependency resolver does not currently take into account all the packages that are installed. This behaviour is the source of the following dependency conflicts. torchvision 0.18.1+cpu requires torch==2.3.1, but you have torch 2.1.0 which is incompatible.

yangyang6666 commented 2 hours ago

I resolve above problem and get this during running train:

ImportError: /usr/local/lib/python3.10/site-packages/change_data_ptr.cpython-310-x86_64-linux-gnu.so: undefined symbol: _ZN3c104impl3cow11cow_deleterEPv Traceback (most recent call last):

yangyang6666 commented 2 hours ago

and when I run $ llamafactory-cli env:

/home/tiger/.local/lib/python3.10/site-packages/torch_npu/dynamo/init.py:18: UserWarning: Register eager implementation for the 'npu' backend of dynamo, as torch_npu was not compiled with torchair. warnings.warn( Traceback (most recent call last): File "/home/tiger/.local/bin/llamafactory-cli", line 5, in from llamafactory.cli import main File "/opt/tiger/agihub-open-lm-sft-npu/LLaMA-Factory/src/llamafactory/cli.py", line 21, in from . import launcher File "/opt/tiger/agihub-open-lm-sft-npu/LLaMA-Factory/src/llamafactory/launcher.py", line 15, in from llamafactory.train.tuner import run_exp # use absolute import File "/opt/tiger/agihub-open-lm-sft-npu/LLaMA-Factory/src/llamafactory/train/tuner.py", line 28, in from .dpo import run_dpo File "/opt/tiger/agihub-open-lm-sft-npu/LLaMA-Factory/src/llamafactory/train/dpo/init.py", line 15, in from .workflow import run_dpo File "/opt/tiger/agihub-open-lm-sft-npu/LLaMA-Factory/src/llamafactory/train/dpo/workflow.py", line 22, in from ...extras.ploting import plot_loss File "/opt/tiger/agihub-open-lm-sft-npu/LLaMA-Factory/src/llamafactory/extras/ploting.py", line 20, in from transformers.trainer import TRAINER_STATE_NAME File "/home/tiger/.local/lib/python3.10/site-packages/transformers/trainer.py", line 189, in from apex import amp File "/usr/local/lib/python3.10/site-packages/apex/init.py", line 8, in from . import amp File "/usr/local/lib/python3.10/site-packages/apex/amp/init.py", line 1, in from .amp import init, half_function, float_function, promote_function,\ File "/usr/local/lib/python3.10/site-packages/apex/amp/amp.py", line 5, in from .frontend import * File "/usr/local/lib/python3.10/site-packages/apex/amp/frontend.py", line 2, in from ._initialize import _initialize File "/usr/local/lib/python3.10/site-packages/apex/amp/_initialize.py", line 27, in from ._process_optimizer import _process_optimizer File "/usr/local/lib/python3.10/site-packages/apex/amp/_process_optimizer.py", line 20, in from change_data_ptr import change_data_ptr ImportError: /usr/local/lib/python3.10/site-packages/change_data_ptr.cpython-310-x86_64-linux-gnu.so: undefined symbol: _ZN3c104impl3cow11cow_deleterEPv