microsoft / LMOps

General technology for enabling AI capabilities w/ LLMs and MLLMs
https://aka.ms/GeneralAI
MIT License
3.6k stars 274 forks source link

ModuleNotFoundError: No module named 'deepspeed' #177

Closed qxpBlog closed 1 week ago

qxpBlog commented 6 months ago

When i run the following command, the problem with the title arises

bash /home/iotsc01/xinpengq/LMOps-main/minillm/scripts/llama2/sft/sft_7B.sh /home/iotsc01/LMOps-main/minillm

the scripts file is following:

#! /bin/bash
BASE_MODEL=/home/iotsc01/.cache/huggingface/hub/models--meta-llama--Llama-2-7b-hf/snapshots/8cca527612d856d7d32bd94f8103728d614eb852
PYTHON_ENV_PATH="/home/iotsc01/anaconda3/envs/distil/bin/python"
MASTER_ADDR=localhost
MASTER_PORT=${2-2012}
NNODES=1
NODE_RANK=0
GPUS_PER_NODE=${3-16}

DISTRIBUTED_ARGS="--nproc_per_node $GPUS_PER_NODE \
                  --nnodes $NNODES \
                  --node_rank $NODE_RANK \
                  --master_addr $MASTER_ADDR \
                  --master_port $MASTER_PORT"

# model
BASE_PATH=${1-"/home/MiniLLM"}
CKPT="${BASE_MODEL}"
CKPT_NAME="llama2-7B"
# data
DATA_DIR="${BASE_PATH}/processed_data/dolly/full/llama2/"
# hp
BATCH_SIZE=1
LR=0.00001
GRAD_ACC=2
EVAL_BATCH_SIZE=2
# length
MAX_LENGTH=512
# runtime
SAVE_PATH="${BASE_PATH}/results/llama2/train/sft"
# seed
SEED=10
SEED_ORDER=10

OPTS=""
# model
OPTS+=" --base-path ${BASE_PATH}"
OPTS+=" --model-path ${CKPT}"
OPTS+=" --ckpt-name ${CKPT_NAME}"
OPTS+=" --n-gpu ${GPUS_PER_NODE}"
OPTS+=" --model-type llama2"
OPTS+=" --gradient-checkpointing"
# data
OPTS+=" --data-dir ${DATA_DIR}"
OPTS+=" --num-workers 0"
OPTS+=" --dev-num 500"
# hp
OPTS+=" --lr ${LR}"
OPTS+=" --batch-size ${BATCH_SIZE}"
OPTS+=" --eval-batch-size ${EVAL_BATCH_SIZE}"
OPTS+=" --gradient-accumulation-steps ${GRAD_ACC}"
OPTS+=" --warmup-iters 0"
OPTS+=" --lr-decay-style cosine"
OPTS+=" --weight-decay 1e-2"
OPTS+=" --clip-grad 1.0"
OPTS+=" --epochs 10"
# length
OPTS+=" --max-length ${MAX_LENGTH}"
OPTS+=" --max-prompt-length 256"
# runtime
OPTS+=" --do-train"
OPTS+=" --do-valid"
OPTS+=" --eval-gen"
OPTS+=" --save-interval -1"
OPTS+=" --eval-interval -1"
OPTS+=" --log-interval 4"
OPTS+=" --mid-log-num 1"
OPTS+=" --save ${SAVE_PATH}"
# seed
OPTS+=" --seed ${SEED}"
OPTS+=" --seed-order ${SEED_ORDER}"
# deepspeed
OPTS+=" --deepspeed"
OPTS+=" --deepspeed_config ${BASE_PATH}/configs/deepspeed/ds_config.json"
# type
OPTS+=" --type lm"
# gen
OPTS+=" --do-sample"
OPTS+=" --top-k 0"
OPTS+=" --top-p 1.0"
OPTS+=" --temperature 1.0"

export NCCL_DEBUG=""
export WANDB_DISABLED=True
export TF_CPP_MIN_LOG_LEVEL=3
export PYTHONPATH=${BASE_PATH}
CMD="torchrun ${DISTRIBUTED_ARGS} ${BASE_PATH}/finetune.py ${OPTS} $@"

echo ${CMD}
echo "PYTHONPATH=${PYTHONPATH}"
mkdir -p ${SAVE_PATH}
${CMD}

my anaconda3 environment distil is following:

# packages in environment at /home/iotsc01/anaconda3/envs/distil:
#
# Name                    Version                   Build  Channel
_libgcc_mutex             0.1                        main    https://mirrors.tuna.tsinghua.edu.cn/anaconda/pkgs/main
_openmp_mutex             5.1                       1_gnu    https://mirrors.tuna.tsinghua.edu.cn/anaconda/pkgs/main
absl-py                   2.1.0                    pypi_0    pypi
accelerate                0.28.0                   pypi_0    pypi
aiohttp                   3.9.3                    pypi_0    pypi
aiosignal                 1.3.1                    pypi_0    pypi
annotated-types           0.6.0                    pypi_0    pypi
async-timeout             4.0.3                    pypi_0    pypi
attrs                     23.2.0                   pypi_0    pypi
blas                      1.0                         mkl    https://mirrors.tuna.tsinghua.edu.cn/anaconda/pkgs/main
bzip2                     1.0.8                h5eee18b_5    https://mirrors.tuna.tsinghua.edu.cn/anaconda/pkgs/main
c-ares                    1.19.1               h5eee18b_0    https://mirrors.tuna.tsinghua.edu.cn/anaconda/pkgs/main
ca-certificates           2023.12.12           h06a4308_0    https://mirrors.tuna.tsinghua.edu.cn/anaconda/pkgs/main
certifi                   2024.2.2                 pypi_0    pypi
cffi                      1.16.0           py38h5eee18b_0    https://mirrors.tuna.tsinghua.edu.cn/anaconda/pkgs/main
charset-normalizer        3.3.2                    pypi_0    pypi
click                     8.1.7                    pypi_0    pypi
cmake                     3.28.3                   pypi_0    pypi
datasets                  2.18.0                   pypi_0    pypi
deepspeed                 0.10.0                   pypi_0    pypi
dill                      0.3.8                    pypi_0    pypi
expat                     2.5.0                h6a678d5_0    https://mirrors.tuna.tsinghua.edu.cn/anaconda/pkgs/main
filelock                  3.13.1                   pypi_0    pypi
frozenlist                1.4.1                    pypi_0    pypi
fsspec                    2024.2.0                 pypi_0    pypi
hjson                     3.1.0                    pypi_0    pypi
huggingface-hub           0.21.4                   pypi_0    pypi
idna                      3.6                      pypi_0    pypi
importlib-metadata        7.0.2                    pypi_0    pypi
intel-openmp              2023.1.0         hdb19cb5_46306    https://mirrors.tuna.tsinghua.edu.cn/anaconda/pkgs/main
jinja2                    3.1.3                    pypi_0    pypi
joblib                    1.3.2                    pypi_0    pypi
krb5                      1.20.1               h143b758_1    https://mirrors.tuna.tsinghua.edu.cn/anaconda/pkgs/main
ld_impl_linux-64          2.38                 h1181459_1    https://mirrors.tuna.tsinghua.edu.cn/anaconda/pkgs/main
libcurl                   8.5.0                h251f7ec_0    https://mirrors.tuna.tsinghua.edu.cn/anaconda/pkgs/main
libedit                   3.1.20230828         h5eee18b_0    https://mirrors.tuna.tsinghua.edu.cn/anaconda/pkgs/main
libev                     4.33                 h7f8727e_1    https://mirrors.tuna.tsinghua.edu.cn/anaconda/pkgs/main
libffi                    3.4.4                h6a678d5_0    https://mirrors.tuna.tsinghua.edu.cn/anaconda/pkgs/main
libgcc-ng                 11.2.0               h1234567_1    https://mirrors.tuna.tsinghua.edu.cn/anaconda/pkgs/main
libgomp                   11.2.0               h1234567_1    https://mirrors.tuna.tsinghua.edu.cn/anaconda/pkgs/main
libnghttp2                1.57.0               h2d74bed_0    https://mirrors.tuna.tsinghua.edu.cn/anaconda/pkgs/main
libssh2                   1.10.0               hdbd6064_2    https://mirrors.tuna.tsinghua.edu.cn/anaconda/pkgs/main
libstdcxx-ng              11.2.0               h1234567_1    https://mirrors.tuna.tsinghua.edu.cn/anaconda/pkgs/main
libuuid                   1.41.5               h5eee18b_0    https://mirrors.tuna.tsinghua.edu.cn/anaconda/pkgs/main
libuv                     1.44.2               h5eee18b_0    https://mirrors.tuna.tsinghua.edu.cn/anaconda/pkgs/main
lit                       18.1.1                   pypi_0    pypi
lz4-c                     1.9.4                h6a678d5_0    https://mirrors.tuna.tsinghua.edu.cn/anaconda/pkgs/main
markdown-it-py            3.0.0                    pypi_0    pypi
markupsafe                2.1.5                    pypi_0    pypi
mdurl                     0.1.2                    pypi_0    pypi
mkl                       2023.1.0         h213fc3f_46344    https://mirrors.tuna.tsinghua.edu.cn/anaconda/pkgs/main
mkl-service               2.4.0            py38h5eee18b_1    https://mirrors.tuna.tsinghua.edu.cn/anaconda/pkgs/main
mkl_fft                   1.3.8            py38h5eee18b_0    https://mirrors.tuna.tsinghua.edu.cn/anaconda/pkgs/main
mkl_random                1.2.4            py38hdb19cb5_0    https://mirrors.tuna.tsinghua.edu.cn/anaconda/pkgs/main
mpmath                    1.3.0                    pypi_0    pypi
multidict                 6.0.5                    pypi_0    pypi
multiprocess              0.70.16                  pypi_0    pypi
ncurses                   6.4                  h6a678d5_0    https://mirrors.tuna.tsinghua.edu.cn/anaconda/pkgs/main
networkx                  3.1                      pypi_0    pypi
ninja                     1.11.1.1                 pypi_0    pypi
nltk                      3.8.1                    pypi_0    pypi
numerize                  0.12                     pypi_0    pypi
numpy                     1.24.3           py38hf6e8229_1    https://mirrors.tuna.tsinghua.edu.cn/anaconda/pkgs/main
numpy-base                1.24.3           py38h060ed82_1    https://mirrors.tuna.tsinghua.edu.cn/anaconda/pkgs/main
nvidia-cublas-cu11        11.10.3.66               pypi_0    pypi
nvidia-cuda-cupti-cu11    11.7.101                 pypi_0    pypi
nvidia-cuda-nvrtc-cu11    11.7.99                  pypi_0    pypi
nvidia-cuda-runtime-cu11  11.7.99                  pypi_0    pypi
nvidia-cudnn-cu11         8.5.0.96                 pypi_0    pypi
nvidia-cufft-cu11         10.9.0.58                pypi_0    pypi
nvidia-curand-cu11        10.2.10.91               pypi_0    pypi
nvidia-cusolver-cu11      11.4.0.1                 pypi_0    pypi
nvidia-cusparse-cu11      11.7.4.91                pypi_0    pypi
nvidia-nccl-cu11          2.14.3                   pypi_0    pypi
nvidia-nvtx-cu11          11.7.91                  pypi_0    pypi
openssl                   3.0.13               h7f8727e_0    https://mirrors.tuna.tsinghua.edu.cn/anaconda/pkgs/main
packaging                 24.0                     pypi_0    pypi
pandas                    2.0.3                    pypi_0    pypi
peft                      0.9.0                    pypi_0    pypi
pillow                    10.2.0                   pypi_0    pypi
pip                       23.3.1           py38h06a4308_0    https://mirrors.tuna.tsinghua.edu.cn/anaconda/pkgs/main
protobuf                  3.20.3                   pypi_0    pypi
psutil                    5.9.8                    pypi_0    pypi
py-cpuinfo                9.0.0                    pypi_0    pypi
pyarrow                   15.0.1                   pypi_0    pypi
pyarrow-hotfix            0.6                      pypi_0    pypi
pycparser                 2.21               pyhd3eb1b0_0    https://mirrors.tuna.tsinghua.edu.cn/anaconda/pkgs/main
pydantic                  1.9.0                    pypi_0    pypi
pydantic-core             2.16.3                   pypi_0    pypi
pygments                  2.17.2                   pypi_0    pypi
pynvml                    11.5.0                   pypi_0    pypi
python                    3.8.18               h955ad1f_0    https://mirrors.tuna.tsinghua.edu.cn/anaconda/pkgs/main
python-dateutil           2.9.0.post0              pypi_0    pypi
pytz                      2024.1                   pypi_0    pypi
pyyaml                    6.0.1            py38h5eee18b_0    https://mirrors.tuna.tsinghua.edu.cn/anaconda/pkgs/main
readline                  8.2                  h5eee18b_0    https://mirrors.tuna.tsinghua.edu.cn/anaconda/pkgs/main
regex                     2023.12.25               pypi_0    pypi
requests                  2.31.0                   pypi_0    pypi
rhash                     1.4.3                hdbd6064_0    https://mirrors.tuna.tsinghua.edu.cn/anaconda/pkgs/main
rich                      13.7.1                   pypi_0    pypi
rouge-score               0.1.2                    pypi_0    pypi
safetensors               0.4.2                    pypi_0    pypi
sentencepiece             0.2.0                    pypi_0    pypi
setuptools                68.2.2           py38h06a4308_0    https://mirrors.tuna.tsinghua.edu.cn/anaconda/pkgs/main
six                       1.16.0                   pypi_0    pypi
sqlite                    3.41.2               h5eee18b_0    https://mirrors.tuna.tsinghua.edu.cn/anaconda/pkgs/main
sympy                     1.12                     pypi_0    pypi
tbb                       2021.8.0             hdb19cb5_0    https://mirrors.tuna.tsinghua.edu.cn/anaconda/pkgs/main
tk                        8.6.12               h1ccaba5_0    https://mirrors.tuna.tsinghua.edu.cn/anaconda/pkgs/main
tokenizers                0.15.2                   pypi_0    pypi
torch                     2.0.1                    pypi_0    pypi
torchtyping               0.1.4                    pypi_0    pypi
torchvision               0.15.2                   pypi_0    pypi
tqdm                      4.66.2                   pypi_0    pypi
transformers              4.36.0.dev0              pypi_0    pypi
triton                    2.0.0                    pypi_0    pypi
typeguard                 4.1.5                    pypi_0    pypi
typing                    3.10.0.0         py38h06a4308_0    https://mirrors.tuna.tsinghua.edu.cn/anaconda/pkgs/main
typing-extensions         4.10.0                   pypi_0    pypi
tzdata                    2024.1                   pypi_0    pypi
urllib3                   2.2.1                    pypi_0    pypi
wheel                     0.41.2           py38h06a4308_0    https://mirrors.tuna.tsinghua.edu.cn/anaconda/pkgs/main
xxhash                    3.4.1                    pypi_0    pypi
xz                        5.4.6                h5eee18b_0    https://mirrors.tuna.tsinghua.edu.cn/anaconda/pkgs/main
yaml                      0.2.5                h7b6447c_0    https://mirrors.tuna.tsinghua.edu.cn/anaconda/pkgs/main
yarl                      1.9.4                    pypi_0    pypi
zipp                      3.18.0                   pypi_0    pypi
zlib                      1.2.13               h5eee18b_0    https://mirrors.tuna.tsinghua.edu.cn/anaconda/pkgs/main
zstd                      1.5.5                hc292b87_0    https://mirrors.tuna.tsinghua.edu.cn/anaconda/pkgs/main

why i have downloaded the package deepspeed, but it does not work

t1101675 commented 4 months ago

Is the environment distil activated with conda activate distil? Can you import deepspeed in the interactive environment after simply running python3?