Open flint-stone opened 2 years ago
It seems you have some problem with building the half-precision kernels. Can you do export TORCH_CUDA_ARCH_LIST=7.0
and rerun to see if you can compile kernels correctly?
@flint-stone -- please see this tutorial for MoE inference: https://www.deepspeed.ai/tutorials/moe-inference-tutorial/
@flint-stone I also noticed that you are using bert model with moe. Is this a custom bert model you modified with DeepSpeed MoE layer?
The DeepSpeed inference will only support moe=true and moe_experts="n" arguments if you are wrapping an existing DeepSpeed MoE model.
Thanks -- I'm trying to use the instructions based on the https://www.deepspeed.ai/tutorials/moe-inference-tutorial/ and I'm getting an error like this:
root@6a8cf98fd467:/mnt/Megatron-DeepSpeed/examples# ./generate_text.sh deepspeed --num_nodes 1 --num_gpus 1 /mnt/Megatron-DeepSpeed/tools/generate_samples_gpt.py --tensor-model-parallel-size 1 --num-layers 24 --hidden-size 2048 --num-attention-heads 16 --max-position-embeddings 1024 --tokenizer-type GPT2BPETokenizer --fp16 --num-experts 2 --mlp-type standard --micro-batch-size 8 --seq-length 10 --out-seq-length 10 --temperature 1.0 --vocab-file /mnt/Megatron-DeepSpeed/gpt2-vocab.json --merge-file /mnt/Megatron-DeepSpeed/gpt2-merges.txt --genfile unconditional_samples.json --top_p 0.9 --log-interval 1 --num-samples 800 --ds-inference [2022-03-04 19:28:41,420] [WARNING] [runner.py:155:fetch_hostfile] Unable to find hostfile, will proceed with training with local resources only. [2022-03-04 19:28:41,448] [INFO] [runner.py:438:main] cmd = /opt/conda/bin/python3.8 -u -m deepspeed.launcher.launch --world_info=eyJsb2NhbGhvc3QiOiBbMF19 --master_addr=127.0.0.1 --master_port=29500 /mnt/Megatron-DeepSpeed/tools/generate_samples_gpt.py --tensor-model-parallel-size 1 --num-layers 24 --hidden-size 2048 --num-attention-heads 16 --max-position-embeddings 1024 --tokenizer-type GPT2BPETokenizer --fp16 --num-experts 2 --mlp-type standard --micro-batch-size 8 --seq-length 10 --out-seq-length 10 --temperature 1.0 --vocab-file /mnt/Megatron-DeepSpeed/gpt2-vocab.json --merge-file /mnt/Megatron-DeepSpeed/gpt2-merges.txt --genfile unconditional_samples.json --top_p 0.9 --log-interval 1 --num-samples 800 --ds-inference [2022-03-04 19:28:42,402] [INFO] [launch.py:96:main] 0 NCCL_VERSION=2.11.4+cuda11.4 [2022-03-04 19:28:42,402] [INFO] [launch.py:103:main] WORLD INFO DICT: {'localhost': [0]} [2022-03-04 19:28:42,402] [INFO] [launch.py:109:main] nnodes=1, num_local_procs=1, node_rank=0 [2022-03-04 19:28:42,402] [INFO] [launch.py:122:main] global_rank_mapping=defaultdict(<class 'list'>, {'localhost': [0]}) [2022-03-04 19:28:42,402] [INFO] [launch.py:123:main] dist_world_size=1 [2022-03-04 19:28:42,402] [INFO] [launch.py:125:main] Setting CUDA_VISIBLE_DEVICES=0 Traceback (most recent call last): File "/mnt/Megatron-DeepSpeed/tools/generate_samples_gpt.py", line 29, in
from megatron.checkpointing import load_checkpoint File "/mnt/Megatron-DeepSpeed/megatron/checkpointing.py", line 25, in from megatron import (get_args, File "/mnt/Megatron-DeepSpeed/megatron/utils.py", line 24, in import amp_C ImportError: /opt/conda/lib/python3.8/site-packages/amp_C.cpython-38-x86_64-linux-gnu.so: undefined symbol: _ZN3c105ErrorC2ENS_14SourceLocationENSt7__cxx1112basic_stringIcSt11char_traitsIcESaIcEEE [2022-03-04 19:28:44,415] [INFO] [launch.py:178:sigkill_handler] Killing subprocess 4640 [2022-03-04 19:28:44,416] [ERROR] [launch.py:184:sigkill_handler] ['/opt/conda/bin/python3.8', '-u', '/mnt/Megatron-DeepSpeed/tools/generate_samples_gpt.py', '--local_rank=0', '--tensor-model-parallel-size', '1', '--num-layers', '24', '--hidden-size', '2048', '--num-attention-heads', '16', '--max-position-embeddings', '1024', '--tokenizer-type', 'GPT2BPETokenizer', '--fp16', '--num-experts', '2', '--mlp-type', 'standard', '--micro-batch-size', '8', '--seq-length', '10', '--out-seq-length', '10', '--temperature', '1.0', '--vocab-file', '/mnt/Megatron-DeepSpeed/gpt2-vocab.json', '--merge-file', '/mnt/Megatron-DeepSpeed/gpt2-merges.txt', '--genfile', 'unconditional_samples.json', '--top_p', '0.9', '--log-interval', '1', '--num-samples', '800', '--ds-inference'] exits with return code = 1
A quick search online says that this could be caused by incompatible version of apex and torch. I'm using pytorch 1.10.2 with apex version 0.1 (with latest DeepSpeed compiled from the source). Is there a recommended version of Pytorch and Apex to be used with this example?
Thanks!
Describe the bug Hi -- I'm trying to build an example to demonstrate expert parallelism feature as described here. I'm getting an error when initiating the inference engine with MoE option enabled.
To Reproduce Here's the code that cause the problem:
ds_report output
Screenshots
System info (please complete the following information):
Legend:
X = Self SYS = Connection traversing PCIe as well as the SMP interconnect between NUMA nodes (e.g., QPI/UPI) NODE = Connection traversing PCIe as well as the interconnect between PCIe Host Bridges within a NUMA node PHB = Connection traversing PCIe as well as a PCIe Host Bridge (typically the CPU) PXB = Connection traversing multiple PCIe bridges (without traversing the PCIe Host Bridge) PIX = Connection traversing at most a single PCIe bridge NV# = Connection traversing a bonded set of # NVLinks