hpcaitech / Open-Sora

Open-Sora: Democratizing Efficient Video Production for All
https://hpcaitech.github.io/Open-Sora/
Apache License 2.0
20.76k stars 1.97k forks source link

No such file or directory: '/opt/conda/envs/opensora/lib/python3.10/site-packages/colossalai/kernel/extensions/pybind/optimizer/optimizer.cpp' #429

Closed yiiizuo closed 3 weeks ago

yiiizuo commented 1 month ago

Traceback (most recent call last): File "/opt/conda/envs/opensora/lib/python3.10/site-packages/colossalai/kernel/extensions/cpp_extension.py", line 132, in load op_kernel = self.import_op() File "/opt/conda/envs/opensora/lib/python3.10/site-packages/colossalai/kernel/extensions/cpp_extension.py", line 61, in import_op return importlib.import_module(self.prebuilt_import_path) File "/opt/conda/envs/opensora/lib/python3.10/importlib/init.py", line 126, in import_module return _bootstrap._gcd_import(name[level:], package, level) File "", line 1050, in _gcd_import File "", line 1027, in _find_and_load File "", line 1004, in _find_and_load_unlocked ModuleNotFoundError: No module named 'colossalai._C.fused_optim_cuda'

During handling of the above exception, another exception occurred:

Traceback (most recent call last): File "/zuoyi/T2V/Code/Open-Sora/scripts/train.py", line 329, in main() File "/zuoyi/T2V/Code/Open-Sora/scripts/train.py", line 163, in main optimizer = HybridAdam( File "/opt/conda/envs/opensora/lib/python3.10/site-packages/colossalai/nn/optimizer/hybrid_adam.py", line 88, in init fused_optim = FusedOptimizerLoader().load() File "/opt/conda/envs/opensora/lib/python3.10/site-packages/colossalai/kernel/kernel_loader.py", line 83, in load return usable_exts[0].load() File "/opt/conda/envs/opensora/lib/python3.10/site-packages/colossalai/kernel/extensions/cpp_extension.py", line 136, in load op_kernel = self.build_jit() File "/opt/conda/envs/opensora/lib/python3.10/site-packages/colossalai/kernel/extensions/cuda_extension.py", line 86, in build_jit op_kernel = load( File "/opt/conda/envs/opensora/lib/python3.10/site-packages/torch/utils/cpp_extension.py", line 1308, in load return _jit_compile( File "/opt/conda/envs/opensora/lib/python3.10/site-packages/torch/utils/cpp_extension.py", line 1669, in _jit_compile version = JIT_EXTENSION_VERSIONER.bump_version_if_changed( File "/opt/conda/envs/opensora/lib/python3.10/site-packages/torch/utils/_cpp_extension_versioner.py", line 45, in bump_version_if_changed hash_value = hash_source_files(hash_value, source_files) File "/opt/conda/envs/opensora/lib/python3.10/site-packages/torch/utils/_cpp_extension_versioner.py", line 15, in hash_source_files with open(filename) as file: FileNotFoundError: [Errno 2] No such file or directory: '/opt/conda/envs/opensora/lib/python3.10/site-packages/colossalai/kernel/extensions/pybind/optimizer/optimizer.cpp'

Process finished with exit code 1

fenghe12 commented 1 month ago

you need to build colossalai from source code,otherwise such error will occur

fenghe12 commented 1 month ago

不能直接用pip 安装 你去看一下colossalai官方的安装指引

yiiizuo commented 1 month ago

Thank you for your reply. Then, when I build colossalai from source code, the following issues occurred: File "/opt/conda/envs/opensora/lib/python3.10/site-packages/colossalai/booster/init.py", line 2, in from .booster import Booster File "/opt/conda/envs/opensora/lib/python3.10/site-packages/colossalai/booster/booster.py", line 26, in from .plugin import Plugin File "/opt/conda/envs/opensora/lib/python3.10/site-packages/colossalai/booster/plugin/init.py", line 1, in from .gemini_plugin import GeminiPlugin File "/opt/conda/envs/opensora/lib/python3.10/site-packages/colossalai/booster/plugin/gemini_plugin.py", line 30, in from colossalai.shardformer import ShardConfig, ShardFormer File "/opt/conda/envs/opensora/lib/python3.10/site-packages/colossalai/shardformer/init.py", line 1, in from .shard import GradientCheckpointConfig, ModelSharder, PipelineGradientCheckpointConfig, ShardConfig, ShardFormer File "/opt/conda/envs/opensora/lib/python3.10/site-packages/colossalai/shardformer/shard/init.py", line 3, in from .sharder import ModelSharder File "/opt/conda/envs/opensora/lib/python3.10/site-packages/colossalai/shardformer/shard/sharder.py", line 10, in from ..policies.auto_policy import get_autopolicy File "/opt/conda/envs/opensora/lib/python3.10/site-packages/colossalai/shardformer/policies/auto_policy.py", line 6, in from .base_policy import Policy File "/opt/conda/envs/opensora/lib/python3.10/site-packages/colossalai/shardformer/policies/base_policy.py", line 13, in from ..layer.normalization import BaseLayerNorm File "/opt/conda/envs/opensora/lib/python3.10/site-packages/colossalai/shardformer/layer/init.py", line 2, in from .attn import AttnMaskType, ColoAttention File "/opt/conda/envs/opensora/lib/python3.10/site-packages/colossalai/shardformer/layer/attn.py", line 7, in from colossalai.kernel.kernel_loader import ( File "/opt/conda/envs/opensora/lib/python3.10/site-packages/colossalai/kernel/kernel_loader.py", line 4, in from .extensions import ( ModuleNotFoundError: No module named 'colossalai.kernel.extensions'

Process finished with exit code 1

JThh commented 1 month ago

Can you refer to this issue, and when installing from source, you pull from main branch?

JThh commented 1 month ago

Let me know if this issue persists after doing so.

github-actions[bot] commented 1 month ago

This issue is stale because it has been open for 7 days with no activity.

aulaywang commented 1 month ago

I meet the similar problem

aulaywang commented 1 month ago

I use the pip install colossalai==0.3.6 and this problem seems to be solved.

fenghe12 commented 1 month ago

you need to build colossalai from source code,otherwise such error will occur

aulaywang commented 1 month ago

you need to build colossalai from source code,otherwise such error will occur

Do you have tutorials or commands about the building process? I tried several times but failed.

ver217 commented 3 weeks ago

Hi, colossalai==0.3.6 and colossalai==0.3.9 are both OK.