microsoft / tutel

Tutel MoE: An Optimized Mixture-of-Experts Implementation
MIT License
694 stars 84 forks source link

ImportError: cannot import name 'tutel_custom_kernel' from 'tutel.impls.jit_compiler' #198

Open zhaojiancheng007 opened 1 year ago

ghostplant commented 1 year ago

It is usually due to environmental issue that Pytorch fails to find CUDA SDK. Can you print the log of installation command below:

python3 -m pip install --verbose --user --upgrade git+https://github.com/microsoft/tutel@main
zhaojiancheng007 commented 1 year ago

Using pip 23.0.1 from /home/ubuntu/anaconda3/envs/snerf/lib/python3.8/site-packages/pip (python 3.8) Looking in indexes: https://mirrors.bfsu.edu.cn/pypi/web/simple/ Collecting git+https://github.com/microsoft/tutel@main Cloning https://github.com/microsoft/tutel (to revision main) to /tmp/pip-req-build-f3vo8y7s Running command git version git version 2.25.1 Running command git clone --filter=blob:none https://github.com/microsoft/tutel /tmp/pip-req-build-f3vo8y7s Cloning into '/tmp/pip-req-build-f3vo8y7s'... Updating files: 3% (2/61) Updating files: 4% (3/61) Updating files: 6% (4/61) Updating files: 8% (5/61) Updating files: 9% (6/61) Updating files: 11% (7/61) Updating files: 13% (8/61) Updating files: 14% (9/61) Updating files: 16% (10/61) Updating files: 18% (11/61) Updating files: 19% (12/61) Updating files: 21% (13/61) Updating files: 22% (14/61) Updating files: 24% (15/61) Updating files: 26% (16/61) Updating files: 27% (17/61) Updating files: 29% (18/61) Updating files: 31% (19/61) Updating files: 32% (20/61) Updating files: 34% (21/61) Updating files: 36% (22/61) Updating files: 37% (23/61) Updating files: 39% (24/61) Updating files: 40% (25/61) Updating files: 42% (26/61) Updating files: 44% (27/61) Updating files: 45% (28/61) Updating files: 47% (29/61) Updating files: 49% (30/61) Updating files: 50% (31/61) Updating files: 52% (32/61) Updating files: 54% (33/61) Updating files: 55% (34/61) Updating files: 57% (35/61) Updating files: 59% (36/61) Updating files: 60% (37/61) Updating files: 62% (38/61) Updating files: 63% (39/61) Updating files: 65% (40/61) Updating files: 67% (41/61) Updating files: 68% (42/61) Updating files: 70% (43/61) Updating files: 72% (44/61) Updating files: 73% (45/61) Updating files: 75% (46/61) Updating files: 77% (47/61) Updating files: 78% (48/61) Updating files: 80% (49/61) Updating files: 81% (50/61) Updating files: 83% (51/61) Updating files: 85% (52/61) Updating files: 86% (53/61) Updating files: 88% (54/61) Updating files: 90% (55/61) Updating files: 91% (56/61) Updating files: 93% (57/61) Updating files: 95% (58/61) Updating files: 96% (59/61) Updating files: 98% (60/61) Updating files: 100% (61/61) Updating files: 100% (61/61), done. Running command git show-ref main 1456b49e27d3aaef09be65da5b74a7be0239bdb4 refs/heads/main 1456b49e27d3aaef09be65da5b74a7be0239bdb4 refs/remotes/origin/main Running command git symbolic-ref -q HEAD refs/heads/main Resolved https://github.com/microsoft/tutel to commit 1456b49e27d3aaef09be65da5b74a7be0239bdb4 Running command git rev-parse HEAD 1456b49e27d3aaef09be65da5b74a7be0239bdb4 Running command python setup.py egg_info running egg_info creating /tmp/pip-pip-egg-info-aiosqnkd/tutel.egg-info writing manifest file '/tmp/pip-pip-egg-info-aiosqnkd/tutel.egg-info/SOURCES.txt' writing manifest file '/tmp/pip-pip-egg-info-aiosqnkd/tutel.egg-info/SOURCES.txt' Preparing metadata (setup.py) ... done Building wheels for collected packages: tutel Running command git rev-parse HEAD 1456b49e27d3aaef09be65da5b74a7be0239bdb4 Running command python setup.py bdist_wheel running bdist_wheel running build running build_py creating build creating build/lib.linux-x86_64-3.8 creating build/lib.linux-x86_64-3.8/tutel copying tutel/system.py -> build/lib.linux-x86_64-3.8/tutel copying tutel/net.py -> build/lib.linux-x86_64-3.8/tutel copying tutel/jit.py -> build/lib.linux-x86_64-3.8/tutel copying tutel/moe.py -> build/lib.linux-x86_64-3.8/tutel copying tutel/init.py -> build/lib.linux-x86_64-3.8/tutel creating build/lib.linux-x86_64-3.8/tutel/jit_kernels copying tutel/jit_kernels/gating.py -> build/lib.linux-x86_64-3.8/tutel/jit_kernels copying tutel/jit_kernels/sparse.py -> build/lib.linux-x86_64-3.8/tutel/jit_kernels copying tutel/jit_kernels/init.py -> build/lib.linux-x86_64-3.8/tutel/jit_kernels creating build/lib.linux-x86_64-3.8/tutel/parted copying tutel/parted/patterns.py -> build/lib.linux-x86_64-3.8/tutel/parted copying tutel/parted/spmdx.py -> build/lib.linux-x86_64-3.8/tutel/parted copying tutel/parted/init.py -> build/lib.linux-x86_64-3.8/tutel/parted copying tutel/parted/solver.py -> build/lib.linux-x86_64-3.8/tutel/parted creating build/lib.linux-x86_64-3.8/tutel/examples copying tutel/examples/moe_mnist.py -> build/lib.linux-x86_64-3.8/tutel/examples copying tutel/examples/helloworld_from_scratch.py -> build/lib.linux-x86_64-3.8/tutel/examples copying tutel/examples/helloworld.py -> build/lib.linux-x86_64-3.8/tutel/examples copying tutel/examples/helloworld_amp.py -> build/lib.linux-x86_64-3.8/tutel/examples copying tutel/examples/moe_cifar10.py -> build/lib.linux-x86_64-3.8/tutel/examples copying tutel/examples/helloworld_ddp_tutel.py -> build/lib.linux-x86_64-3.8/tutel/examples copying tutel/examples/helloworld_deepspeed.py -> build/lib.linux-x86_64-3.8/tutel/examples copying tutel/examples/init.py -> build/lib.linux-x86_64-3.8/tutel/examples copying tutel/examples/helloworld_ddp.py -> build/lib.linux-x86_64-3.8/tutel/examples creating build/lib.linux-x86_64-3.8/tutel/experts copying tutel/experts/ffn.py -> build/lib.linux-x86_64-3.8/tutel/experts copying tutel/experts/init.py -> build/lib.linux-x86_64-3.8/tutel/experts creating build/lib.linux-x86_64-3.8/tutel/checkpoint copying tutel/checkpoint/scatter.py -> build/lib.linux-x86_64-3.8/tutel/checkpoint copying tutel/checkpoint/init.py -> build/lib.linux-x86_64-3.8/tutel/checkpoint copying tutel/checkpoint/gather.py -> build/lib.linux-x86_64-3.8/tutel/checkpoint creating build/lib.linux-x86_64-3.8/tutel/custom copying tutel/custom/init.py -> build/lib.linux-x86_64-3.8/tutel/custom creating build/lib.linux-x86_64-3.8/tutel/launcher copying tutel/launcher/run.py -> build/lib.linux-x86_64-3.8/tutel/launcher copying tutel/launcher/execl.py -> build/lib.linux-x86_64-3.8/tutel/launcher copying tutel/launcher/init.py -> build/lib.linux-x86_64-3.8/tutel/launcher creating build/lib.linux-x86_64-3.8/tutel/gates copying tutel/gates/cosine_top.py -> build/lib.linux-x86_64-3.8/tutel/gates copying tutel/gates/top.py -> build/lib.linux-x86_64-3.8/tutel/gates copying tutel/gates/init.py -> build/lib.linux-x86_64-3.8/tutel/gates creating build/lib.linux-x86_64-3.8/tutel/impls copying tutel/impls/fast_dispatch.py -> build/lib.linux-x86_64-3.8/tutel/impls copying tutel/impls/jit_compiler.py -> build/lib.linux-x86_64-3.8/tutel/impls copying tutel/impls/moe_layer.py -> build/lib.linux-x86_64-3.8/tutel/impls copying tutel/impls/overlap.py -> build/lib.linux-x86_64-3.8/tutel/impls copying tutel/impls/communicate.py -> build/lib.linux-x86_64-3.8/tutel/impls copying tutel/impls/init.py -> build/lib.linux-x86_64-3.8/tutel/impls copying tutel/impls/losses.py -> build/lib.linux-x86_64-3.8/tutel/impls creating build/lib.linux-x86_64-3.8/tutel/parted/backend copying tutel/parted/backend/init.py -> build/lib.linux-x86_64-3.8/tutel/parted/backend creating build/lib.linux-x86_64-3.8/tutel/parted/backend/torch copying tutel/parted/backend/torch/config.py -> build/lib.linux-x86_64-3.8/tutel/parted/backend/torch copying tutel/parted/backend/torch/executor.py -> build/lib.linux-x86_64-3.8/tutel/parted/backend/torch copying tutel/parted/backend/torch/init.py -> build/lib.linux-x86_64-3.8/tutel/parted/backend/torch running build_ext creating /tmp/pip-req-build-f3vo8y7s/build/temp.linux-x86_64-3.8 creating /tmp/pip-req-build-f3vo8y7s/build/temp.linux-x86_64-3.8/tutel creating /tmp/pip-req-build-f3vo8y7s/build/temp.linux-x86_64-3.8/tutel/custom Emitting ninja build file /tmp/pip-req-build-f3vo8y7s/build/temp.linux-x86_64-3.8/build.ninja... Compiling objects... Allowing ninja to set a default number of workers... (overridable by setting the environment variable MAX_JOBS=N) [1/1] c++ -MMD -MF /tmp/pip-req-build-f3vo8y7s/build/temp.linux-x86_64-3.8/tutel/custom/custom_kernel.o.d -pthread -B /home/ubuntu/anaconda3/envs/snerf/compiler_compat -Wl,--sysroot=/ -Wsign-compare -DNDEBUG -g -fwrapv -O3 -Wall -Wstrict-prototypes -fPIC -I/home/ubuntu/.local/lib/python3.8/site-packages/torch/include -I/home/ubuntu/.local/lib/python3.8/site-packages/torch/include/torch/csrc/api/include -I/home/ubuntu/.local/lib/python3.8/site-packages/torch/include/TH -I/home/ubuntu/.local/lib/python3.8/site-packages/torch/include/THC -I/usr/local/cuda-11.6/include -I/home/ubuntu/anaconda3/envs/snerf/include/python3.8 -c -c /tmp/pip-req-build-f3vo8y7s/tutel/custom/custom_kernel.cpp -o /tmp/pip-req-build-f3vo8y7s/build/temp.linux-x86_64-3.8/tutel/custom/custom_kernel.o -Wno-sign-compare -Wno-unused-but-set-variable -Wno-terminate -Wno-unused-function -Wno-strict-aliasing -DUSE_GPU -DUSE_NCCL -DTORCH_API_INCLUDE_EXTENSION_H '-DPYBIND11_COMPILER_TYPE="_gcc"' '-DPYBIND11_STDLIB="_libstdcpp"' '-DPYBIND11_BUILD_ABI="_cxxabi1011"' -DTORCH_EXTENSION_NAME=tutel_custom_kernel -D_GLIBCXX_USE_CXX11_ABI=0 -std=c++14 cc1plus: warning: command line option ‘-Wstrict-prototypes’ is valid for C/ObjC but not for C++ g++ -pthread -shared -B /home/ubuntu/anaconda3/envs/snerf/compiler_compat -L/home/ubuntu/anaconda3/envs/snerf/lib -Wl,-rpath=/home/ubuntu/anaconda3/envs/snerf/lib -Wl,--no-as-needed -Wl,--sysroot=/ /tmp/pip-req-build-f3vo8y7s/build/temp.linux-x86_64-3.8/./tutel/custom/custom_kernel.o -L/usr/local/cuda/lib64/stubs -L/home/ubuntu/.local/lib/python3.8/site-packages/torch/lib -L/usr/local/cuda-11.6/lib64 -lcuda -lnvrtc -lnccl -lc10 -ltorch -ltorch_cpu -ltorch_python -lcudart -lc10_cuda -ltorch_cuda_cu -ltorch_cuda_cpp -o build/lib.linux-x86_64-3.8/tutel_custom_kernel.cpython-38-x86_64-linux-gnu.so installing to build/bdist.linux-x86_64/wheel running install running install_lib creating build/bdist.linux-x86_64 creating build/bdist.linux-x86_64/wheel creating build/bdist.linux-x86_64/wheel/tutel creating build/bdist.linux-x86_64/wheel/tutel/jit_kernels creating build/bdist.linux-x86_64/wheel/tutel/parted creating build/bdist.linux-x86_64/wheel/tutel/parted/backend creating build/bdist.linux-x86_64/wheel/tutel/parted/backend/torch creating build/bdist.linux-x86_64/wheel/tutel/examples creating build/bdist.linux-x86_64/wheel/tutel/experts creating build/bdist.linux-x86_64/wheel/tutel/checkpoint creating build/bdist.linux-x86_64/wheel/tutel/custom creating build/bdist.linux-x86_64/wheel/tutel/launcher creating build/bdist.linux-x86_64/wheel/tutel/gates creating build/bdist.linux-x86_64/wheel/tutel/impls running install_egg_info running egg_info creating tutel.egg-info writing manifest file 'tutel.egg-info/SOURCES.txt' writing manifest file 'tutel.egg-info/SOURCES.txt' Copying tutel.egg-info to build/bdist.linux-x86_64/wheel/tutel-0.1-py3.8.egg-info running install_scripts creating build/bdist.linux-x86_64/wheel/tutel-0.1.dist-info/WHEEL creating '/tmp/pip-wheel-fsgwko7i/tutel-0.1-cp38-cp38-linux_x86_64.whl' and adding 'build/bdist.linux-x86_64/wheel' to it adding 'tutel_custom_kernel.cpython-38-x86_64-linux-gnu.so' adding 'tutel/init.py' adding 'tutel/jit.py' adding 'tutel/moe.py' adding 'tutel/net.py' adding 'tutel/system.py' adding 'tutel/checkpoint/init.py' adding 'tutel/checkpoint/gather.py' adding 'tutel/checkpoint/scatter.py' adding 'tutel/custom/init.py' adding 'tutel/examples/init.py' adding 'tutel/examples/helloworld.py' adding 'tutel/examples/helloworld_amp.py' adding 'tutel/examples/helloworld_ddp.py' adding 'tutel/examples/helloworld_ddp_tutel.py' adding 'tutel/examples/helloworld_deepspeed.py' adding 'tutel/examples/helloworld_from_scratch.py' adding 'tutel/examples/moe_cifar10.py' adding 'tutel/examples/moe_mnist.py' adding 'tutel/experts/init.py' adding 'tutel/experts/ffn.py' adding 'tutel/gates/init.py' adding 'tutel/gates/cosine_top.py' adding 'tutel/gates/top.py' adding 'tutel/impls/init.py' adding 'tutel/impls/communicate.py' adding 'tutel/impls/fast_dispatch.py' adding 'tutel/impls/jit_compiler.py' adding 'tutel/impls/losses.py' adding 'tutel/impls/moe_layer.py' adding 'tutel/impls/overlap.py' adding 'tutel/jit_kernels/init.py' adding 'tutel/jit_kernels/gating.py' adding 'tutel/jit_kernels/sparse.py' adding 'tutel/launcher/init.py' adding 'tutel/launcher/execl.py' adding 'tutel/launcher/run.py' adding 'tutel/parted/init.py' adding 'tutel/parted/patterns.py' adding 'tutel/parted/solver.py' adding 'tutel/parted/spmdx.py' adding 'tutel/parted/backend/init.py' adding 'tutel/parted/backend/torch/init.py' adding 'tutel/parted/backend/torch/config.py' adding 'tutel/parted/backend/torch/executor.py' adding 'tutel-0.1.dist-info/LICENSE' adding 'tutel-0.1.dist-info/METADATA' adding 'tutel-0.1.dist-info/WHEEL' adding 'tutel-0.1.dist-info/top_level.txt' adding 'tutel-0.1.dist-info/RECORD' removing build/bdist.linux-x86_64/wheel Building wheel for tutel (setup.py) ... done Created wheel for tutel: filename=tutel-0.1-cp38-cp38-linux_x86_64.whl size=3818720 sha256=c9229a1d4450e51722ce8c3ee1ac1f168c52eb336f50cb8b74541f46db9908d6 Stored in directory: /tmp/pip-ephem-wheel-cache-8p0gi8d8/wheels/fd/b8/fb/efc186bf3c0931e42fd89af67fe0cfcdece6fb5b055e69ec0a Successfully built tutel Installing collected packages: tutel Running command git rev-parse HEAD 1456b49e27d3aaef09be65da5b74a7be0239bdb4 Successfully installed tutel-0.1

ghostplant commented 1 year ago

Thanks. What about the standard output of this:

python3 -c 'import torch; import tutel_custom_kernel'
zhaojiancheng007 commented 1 year ago

Thanks! seems like it doesn't have the module 'torch_custom_tutel'

ghostplant commented 1 year ago

Can you search where is the OS path of this file in your anaconda3 environment:

find /home/ubuntu/anaconda3 | grep tutel_custom_kernel

Your anaconda3 doesn't automatically add it to the PYTHON_PATH.

For PYPI installation instead of anaconda, I don't think there would be such problem, and the file is usually installed at some path like:

/usr/local/lib/python3.8/dist-packages/tutel_custom_kernel.cpython-38m-x86_64-linux-gnu.so
zhaojiancheng007 commented 1 year ago

I sorry that I did follow the installation procedures, I still couldn't find the file 'tutel_custom_kernel', in the dist-packages. I'm not sure which part went wrong. I use CUDA11.6 and torch==1.10.0+cu113. Another error that always shows up 'ImportError: libnvrtc.so.11.0: cannot open shared object file: No such file or directory'

ghostplant commented 1 year ago

I sorry that I did follow the installation procedures, I still couldn't find the file 'tutel_custom_kernel', in the dist-packages. I'm not sure which part went wrong. I use CUDA11.6 and torch==1.10.0+cu113. Another error that always shows up 'ImportError: libnvrtc.so.11.0: cannot open shared object file: No such file or directory'

OK, so the problem is not from anaconda's site location, but your Pytorch fails to detach CUDA library environment and related versioning.

You have several options:

1) find the location of libnvrtc.so.11.0 and put it to LD_LIBRARY_PATH. 2) find the location of libnvrtc.so.11.6 and create a symbolic link for it and name it as libnvrtc.so.11.0

ghostplant commented 1 year ago

Because those shared libraries fails to locate on the disk, so Pytorch C++ modules can't load at initialization.

zhaojiancheng007 commented 1 year ago

Thanks for your patience, I did what you told me, the problem is still unsolved,. I think maybe something wrong with the ninja compiler while installing? I paste the installation log here. And I use CUDA10.2, with torch version torch1.10.0+cu102 Thanks a lot!

running install running bdist_egg running egg_info writing manifest file 'tutel.egg-info/SOURCES.txt' running install_lib running build_py running build_ext Emitting ninja build file /home/ubuntu/zcq/tutel/build/temp.linux-x86_64-3.8/build.ninja... Compiling objects... Allowing ninja to set a default number of workers... (overridable by setting the environment variable MAX_JOBS=N) ninja: no work to do. g++ -pthread -shared -B /home/ubuntu/anaconda3/envs/snerf/compiler_compat -L/home/ubuntu/anaconda3/envs/snerf/lib -Wl,-rpath=/home/ubuntu/anaconda3/envs/snerf/lib -Wl,--no-as-needed -Wl,--sysroot=/ /home/ubuntu/zcq/tutel/build/temp.linux-x86_64-3.8/./tutel/custom/custom_kernel.o -L/usr/local/cuda/lib64/stubs -L/home/ubuntu/.local/lib/python3.8/site-packages/torch/lib -L/usr/local/cuda-11.6/lib64 -ldl -lcuda -lnvrtc -lnccl -lc10 -ltorch -ltorch_cpu -ltorch_python -lcudart -lc10_cuda -ltorch_cuda -o build/lib.linux-x86_64-3.8/tutel_custom_kernel.cpython-38-x86_64-linux-gnu.so creating build/bdist.linux-x86_64/egg creating build/bdist.linux-x86_64/egg/tutel creating build/bdist.linux-x86_64/egg/tutel/jit_kernels creating build/bdist.linux-x86_64/egg/tutel/parted creating build/bdist.linux-x86_64/egg/tutel/parted/backend creating build/bdist.linux-x86_64/egg/tutel/parted/backend/torch creating build/bdist.linux-x86_64/egg/tutel/examples creating build/bdist.linux-x86_64/egg/tutel/custom creating build/bdist.linux-x86_64/egg/tutel/launcher creating build/bdist.linux-x86_64/egg/tutel/impls byte-compiling build/bdist.linux-x86_64/egg/tutel/system.py to system.cpython-38.pyc byte-compiling build/bdist.linux-x86_64/egg/tutel/jit_kernels/gating.py to gating.cpython-38.pyc byte-compiling build/bdist.linux-x86_64/egg/tutel/jit_kernels/sparse.py to sparse.cpython-38.pyc byte-compiling build/bdist.linux-x86_64/egg/tutel/jit_kernels/init.py to init.cpython-38.pyc byte-compiling build/bdist.linux-x86_64/egg/tutel/net.py to net.cpython-38.pyc byte-compiling build/bdist.linux-x86_64/egg/tutel/jit.py to jit.cpython-38.pyc byte-compiling build/bdist.linux-x86_64/egg/tutel/parted/backend/init.py to init.cpython-38.pyc byte-compiling build/bdist.linux-x86_64/egg/tutel/parted/backend/torch/config.py to config.cpython-38.pyc byte-compiling build/bdist.linux-x86_64/egg/tutel/parted/backend/torch/executor.py to executor.cpython-38.pyc byte-compiling build/bdist.linux-x86_64/egg/tutel/parted/backend/torch/init.py to init.cpython-38.pyc byte-compiling build/bdist.linux-x86_64/egg/tutel/parted/patterns.py to patterns.cpython-38.pyc byte-compiling build/bdist.linux-x86_64/egg/tutel/parted/spmdx.py to spmdx.cpython-38.pyc byte-compiling build/bdist.linux-x86_64/egg/tutel/parted/init.py to init.cpython-38.pyc byte-compiling build/bdist.linux-x86_64/egg/tutel/parted/solver.py to solver.cpython-38.pyc byte-compiling build/bdist.linux-x86_64/egg/tutel/examples/helloworld_from_scratch.py to helloworld_from_scratch.cpython-38.pyc byte-compiling build/bdist.linux-x86_64/egg/tutel/examples/helloworld.py to helloworld.cpython-38.pyc byte-compiling build/bdist.linux-x86_64/egg/tutel/examples/helloworld_amp.py to helloworld_amp.cpython-38.pyc byte-compiling build/bdist.linux-x86_64/egg/tutel/examples/helloworld_deepspeed.py to helloworld_deepspeed.cpython-38.pyc byte-compiling build/bdist.linux-x86_64/egg/tutel/examples/init.py to init.cpython-38.pyc byte-compiling build/bdist.linux-x86_64/egg/tutel/examples/helloworld_ddp.py to helloworld_ddp.cpython-38.pyc byte-compiling build/bdist.linux-x86_64/egg/tutel/custom/init.py to init.cpython-38.pyc byte-compiling build/bdist.linux-x86_64/egg/tutel/moe.py to moe.cpython-38.pyc byte-compiling build/bdist.linux-x86_64/egg/tutel/launcher/run.py to run.cpython-38.pyc byte-compiling build/bdist.linux-x86_64/egg/tutel/launcher/execl.py to execl.cpython-38.pyc byte-compiling build/bdist.linux-x86_64/egg/tutel/launcher/init.py to init.cpython-38.pyc byte-compiling build/bdist.linux-x86_64/egg/tutel/init.py to init.cpython-38.pyc byte-compiling build/bdist.linux-x86_64/egg/tutel/impls/fast_dispatch.py to fast_dispatch.cpython-38.pyc byte-compiling build/bdist.linux-x86_64/egg/tutel/impls/jit_compiler.py to jit_compiler.cpython-38.pyc byte-compiling build/bdist.linux-x86_64/egg/tutel/impls/moe_layer.py to moe_layer.cpython-38.pyc byte-compiling build/bdist.linux-x86_64/egg/tutel/impls/communicate.py to communicate.cpython-38.pyc byte-compiling build/bdist.linux-x86_64/egg/tutel/impls/init.py to init.cpython-38.pyc byte-compiling build/bdist.linux-x86_64/egg/tutel_custom_kernel.py to tutel_custom_kernel.cpython-38.pyc creating build/bdist.linux-x86_64/egg/EGG-INFO copying tutel.egg-info/PKG-INFO -> build/bdist.linux-x86_64/egg/EGG-INFO copying tutel.egg-info/SOURCES.txt -> build/bdist.linux-x86_64/egg/EGG-INFO copying tutel.egg-info/dependency_links.txt -> build/bdist.linux-x86_64/egg/EGG-INFO copying tutel.egg-info/requires.txt -> build/bdist.linux-x86_64/egg/EGG-INFO copying tutel.egg-info/top_level.txt -> build/bdist.linux-x86_64/egg/EGG-INFO zip_safe flag not set; analyzing archive contents... pycache.tutel_custom_kernel.cpython-38: module references file removing 'build/bdist.linux-x86_64/egg' (and everything under it) creating /home/ubuntu/.local/lib/python3.8/site-packages/tutel-0.1-py3.8-linux-x86_64.egg Extracting tutel-0.1-py3.8-linux-x86_64.egg to /home/ubuntu/.local/lib/python3.8/site-packages byte-compiling /home/ubuntu/.local/lib/python3.8/site-packages/tutel-0.1-py3.8-linux-x86_64.egg/tutel_custom_kernel.py to tutel_custom_kernel.cpython-38.pyc byte-compiling /home/ubuntu/.local/lib/python3.8/site-packages/tutel-0.1-py3.8-linux-x86_64.egg/tutel/init.py to init.cpython-38.pyc byte-compiling /home/ubuntu/.local/lib/python3.8/site-packages/tutel-0.1-py3.8-linux-x86_64.egg/tutel/jit.py to jit.cpython-38.pyc byte-compiling /home/ubuntu/.local/lib/python3.8/site-packages/tutel-0.1-py3.8-linux-x86_64.egg/tutel/moe.py to moe.cpython-38.pyc byte-compiling /home/ubuntu/.local/lib/python3.8/site-packages/tutel-0.1-py3.8-linux-x86_64.egg/tutel/net.py to net.cpython-38.pyc byte-compiling /home/ubuntu/.local/lib/python3.8/site-packages/tutel-0.1-py3.8-linux-x86_64.egg/tutel/system.py to system.cpython-38.pyc byte-compiling /home/ubuntu/.local/lib/python3.8/site-packages/tutel-0.1-py3.8-linux-x86_64.egg/tutel/custom/init.py to init.cpython-38.pyc byte-compiling /home/ubuntu/.local/lib/python3.8/site-packages/tutel-0.1-py3.8-linux-x86_64.egg/tutel/examples/init.py to init.cpython-38.pyc byte-compiling /home/ubuntu/.local/lib/python3.8/site-packages/tutel-0.1-py3.8-linux-x86_64.egg/tutel/examples/helloworld.py to helloworld.cpython-38.pyc byte-compiling /home/ubuntu/.local/lib/python3.8/site-packages/tutel-0.1-py3.8-linux-x86_64.egg/tutel/examples/helloworld_amp.py to helloworld_amp.cpython-38.pyc byte-compiling /home/ubuntu/.local/lib/python3.8/site-packages/tutel-0.1-py3.8-linux-x86_64.egg/tutel/examples/helloworld_ddp.py to helloworld_ddp.cpython-38.pyc byte-compiling /home/ubuntu/.local/lib/python3.8/site-packages/tutel-0.1-py3.8-linux-x86_64.egg/tutel/examples/helloworld_deepspeed.py to helloworld_deepspeed.cpython-38.pyc byte-compiling /home/ubuntu/.local/lib/python3.8/site-packages/tutel-0.1-py3.8-linux-x86_64.egg/tutel/examples/helloworld_from_scratch.py to helloworld_from_scratch.cpython-38.pyc byte-compiling /home/ubuntu/.local/lib/python3.8/site-packages/tutel-0.1-py3.8-linux-x86_64.egg/tutel/impls/init.py to init.cpython-38.pyc byte-compiling /home/ubuntu/.local/lib/python3.8/site-packages/tutel-0.1-py3.8-linux-x86_64.egg/tutel/impls/communicate.py to communicate.cpython-38.pyc byte-compiling /home/ubuntu/.local/lib/python3.8/site-packages/tutel-0.1-py3.8-linux-x86_64.egg/tutel/impls/fast_dispatch.py to fast_dispatch.cpython-38.pyc byte-compiling /home/ubuntu/.local/lib/python3.8/site-packages/tutel-0.1-py3.8-linux-x86_64.egg/tutel/impls/jit_compiler.py to jit_compiler.cpython-38.pyc byte-compiling /home/ubuntu/.local/lib/python3.8/site-packages/tutel-0.1-py3.8-linux-x86_64.egg/tutel/impls/moe_layer.py to moe_layer.cpython-38.pyc byte-compiling /home/ubuntu/.local/lib/python3.8/site-packages/tutel-0.1-py3.8-linux-x86_64.egg/tutel/jit_kernels/init.py to init.cpython-38.pyc byte-compiling /home/ubuntu/.local/lib/python3.8/site-packages/tutel-0.1-py3.8-linux-x86_64.egg/tutel/jit_kernels/gating.py to gating.cpython-38.pyc byte-compiling /home/ubuntu/.local/lib/python3.8/site-packages/tutel-0.1-py3.8-linux-x86_64.egg/tutel/jit_kernels/sparse.py to sparse.cpython-38.pyc byte-compiling /home/ubuntu/.local/lib/python3.8/site-packages/tutel-0.1-py3.8-linux-x86_64.egg/tutel/launcher/init.py to init.cpython-38.pyc byte-compiling /home/ubuntu/.local/lib/python3.8/site-packages/tutel-0.1-py3.8-linux-x86_64.egg/tutel/launcher/execl.py to execl.cpython-38.pyc byte-compiling /home/ubuntu/.local/lib/python3.8/site-packages/tutel-0.1-py3.8-linux-x86_64.egg/tutel/launcher/run.py to run.cpython-38.pyc byte-compiling /home/ubuntu/.local/lib/python3.8/site-packages/tutel-0.1-py3.8-linux-x86_64.egg/tutel/parted/init.py to init.cpython-38.pyc byte-compiling /home/ubuntu/.local/lib/python3.8/site-packages/tutel-0.1-py3.8-linux-x86_64.egg/tutel/parted/patterns.py to patterns.cpython-38.pyc byte-compiling /home/ubuntu/.local/lib/python3.8/site-packages/tutel-0.1-py3.8-linux-x86_64.egg/tutel/parted/solver.py to solver.cpython-38.pyc byte-compiling /home/ubuntu/.local/lib/python3.8/site-packages/tutel-0.1-py3.8-linux-x86_64.egg/tutel/parted/spmdx.py to spmdx.cpython-38.pyc byte-compiling /home/ubuntu/.local/lib/python3.8/site-packages/tutel-0.1-py3.8-linux-x86_64.egg/tutel/parted/backend/init.py to init.cpython-38.pyc byte-compiling /home/ubuntu/.local/lib/python3.8/site-packages/tutel-0.1-py3.8-linux-x86_64.egg/tutel/parted/backend/torch/init.py to init.cpython-38.pyc byte-compiling /home/ubuntu/.local/lib/python3.8/site-packages/tutel-0.1-py3.8-linux-x86_64.egg/tutel/parted/backend/torch/config.py to config.cpython-38.pyc byte-compiling /home/ubuntu/.local/lib/python3.8/site-packages/tutel-0.1-py3.8-linux-x86_64.egg/tutel/parted/backend/torch/executor.py to executor.cpython-38.pyc Adding tutel 0.1 to easy-install.pth file

Installed /home/ubuntu/.local/lib/python3.8/site-packages/tutel-0.1-py3.8-linux-x86_64.egg Processing dependencies for tutel==0.1 Finished processing dependencies for tutel==0.1

zhaojiancheng007 commented 1 year ago

Thanks! I reinstall CUDA and torch, update tutel to the latest version, and it works! Thanks for your patience, that really helps me a lot.

zachary62 commented 1 year ago

Thanks! I reinstall CUDA and torch, update tutel to the latest version, and it works! Thanks for your patience, that really helps me a lot.

Can you share your CUDA and Pytorch version? I have the same issue, and reinstall doesn't work

ghostplant commented 1 year ago

Thanks! I reinstall CUDA and torch, update tutel to the latest version, and it works! Thanks for your patience, that really helps me a lot.

Can you share your CUDA and Pytorch version? I have the same issue, and reinstall doesn't work.

Mostly it is related to Pytorch fails to import standard C++ extension due to improper/messed-up extension location.

Here are several possibilities.

  1. Pytorch user is the root-cause (e.g. root or non-root) because Pytorch is installed by an unknown else users.
  2. Multiple C++ extension is found at different site locations (e.g. a version exists in root sites, and another version exists in user sites), making Pytorch imports a improper one.
  3. CUDA environment is not configured correctly, making C++ extension failed in setup procedure or library loading procedure. However for this case, you can usually see those related error logs during installation, e.g. nvcc or libcuda.so is not found.