Closed qxpBlog closed 2 months ago
When i run the following command, the problem with the title arises
bash /home/iotsc01/xinpengq/LMOps-main/minillm/scripts/llama2/sft/sft_7B.sh /home/iotsc01/LMOps-main/minillm
the scripts file is following:
#! /bin/bash BASE_MODEL=/home/iotsc01/.cache/huggingface/hub/models--meta-llama--Llama-2-7b-hf/snapshots/8cca527612d856d7d32bd94f8103728d614eb852 PYTHON_ENV_PATH="/home/iotsc01/anaconda3/envs/distil/bin/python" MASTER_ADDR=localhost MASTER_PORT=${2-2012} NNODES=1 NODE_RANK=0 GPUS_PER_NODE=${3-16} DISTRIBUTED_ARGS="--nproc_per_node $GPUS_PER_NODE \ --nnodes $NNODES \ --node_rank $NODE_RANK \ --master_addr $MASTER_ADDR \ --master_port $MASTER_PORT" # model BASE_PATH=${1-"/home/MiniLLM"} CKPT="${BASE_MODEL}" CKPT_NAME="llama2-7B" # data DATA_DIR="${BASE_PATH}/processed_data/dolly/full/llama2/" # hp BATCH_SIZE=1 LR=0.00001 GRAD_ACC=2 EVAL_BATCH_SIZE=2 # length MAX_LENGTH=512 # runtime SAVE_PATH="${BASE_PATH}/results/llama2/train/sft" # seed SEED=10 SEED_ORDER=10 OPTS="" # model OPTS+=" --base-path ${BASE_PATH}" OPTS+=" --model-path ${CKPT}" OPTS+=" --ckpt-name ${CKPT_NAME}" OPTS+=" --n-gpu ${GPUS_PER_NODE}" OPTS+=" --model-type llama2" OPTS+=" --gradient-checkpointing" # data OPTS+=" --data-dir ${DATA_DIR}" OPTS+=" --num-workers 0" OPTS+=" --dev-num 500" # hp OPTS+=" --lr ${LR}" OPTS+=" --batch-size ${BATCH_SIZE}" OPTS+=" --eval-batch-size ${EVAL_BATCH_SIZE}" OPTS+=" --gradient-accumulation-steps ${GRAD_ACC}" OPTS+=" --warmup-iters 0" OPTS+=" --lr-decay-style cosine" OPTS+=" --weight-decay 1e-2" OPTS+=" --clip-grad 1.0" OPTS+=" --epochs 10" # length OPTS+=" --max-length ${MAX_LENGTH}" OPTS+=" --max-prompt-length 256" # runtime OPTS+=" --do-train" OPTS+=" --do-valid" OPTS+=" --eval-gen" OPTS+=" --save-interval -1" OPTS+=" --eval-interval -1" OPTS+=" --log-interval 4" OPTS+=" --mid-log-num 1" OPTS+=" --save ${SAVE_PATH}" # seed OPTS+=" --seed ${SEED}" OPTS+=" --seed-order ${SEED_ORDER}" # deepspeed OPTS+=" --deepspeed" OPTS+=" --deepspeed_config ${BASE_PATH}/configs/deepspeed/ds_config.json" # type OPTS+=" --type lm" # gen OPTS+=" --do-sample" OPTS+=" --top-k 0" OPTS+=" --top-p 1.0" OPTS+=" --temperature 1.0" export NCCL_DEBUG="" export WANDB_DISABLED=True export TF_CPP_MIN_LOG_LEVEL=3 export PYTHONPATH=${BASE_PATH} CMD="torchrun ${DISTRIBUTED_ARGS} ${BASE_PATH}/finetune.py ${OPTS} $@" echo ${CMD} echo "PYTHONPATH=${PYTHONPATH}" mkdir -p ${SAVE_PATH} ${CMD}
my anaconda3 environment distil is following:
distil
# packages in environment at /home/iotsc01/anaconda3/envs/distil: # # Name Version Build Channel _libgcc_mutex 0.1 main https://mirrors.tuna.tsinghua.edu.cn/anaconda/pkgs/main _openmp_mutex 5.1 1_gnu https://mirrors.tuna.tsinghua.edu.cn/anaconda/pkgs/main absl-py 2.1.0 pypi_0 pypi accelerate 0.28.0 pypi_0 pypi aiohttp 3.9.3 pypi_0 pypi aiosignal 1.3.1 pypi_0 pypi annotated-types 0.6.0 pypi_0 pypi async-timeout 4.0.3 pypi_0 pypi attrs 23.2.0 pypi_0 pypi blas 1.0 mkl https://mirrors.tuna.tsinghua.edu.cn/anaconda/pkgs/main bzip2 1.0.8 h5eee18b_5 https://mirrors.tuna.tsinghua.edu.cn/anaconda/pkgs/main c-ares 1.19.1 h5eee18b_0 https://mirrors.tuna.tsinghua.edu.cn/anaconda/pkgs/main ca-certificates 2023.12.12 h06a4308_0 https://mirrors.tuna.tsinghua.edu.cn/anaconda/pkgs/main certifi 2024.2.2 pypi_0 pypi cffi 1.16.0 py38h5eee18b_0 https://mirrors.tuna.tsinghua.edu.cn/anaconda/pkgs/main charset-normalizer 3.3.2 pypi_0 pypi click 8.1.7 pypi_0 pypi cmake 3.28.3 pypi_0 pypi datasets 2.18.0 pypi_0 pypi deepspeed 0.10.0 pypi_0 pypi dill 0.3.8 pypi_0 pypi expat 2.5.0 h6a678d5_0 https://mirrors.tuna.tsinghua.edu.cn/anaconda/pkgs/main filelock 3.13.1 pypi_0 pypi frozenlist 1.4.1 pypi_0 pypi fsspec 2024.2.0 pypi_0 pypi hjson 3.1.0 pypi_0 pypi huggingface-hub 0.21.4 pypi_0 pypi idna 3.6 pypi_0 pypi importlib-metadata 7.0.2 pypi_0 pypi intel-openmp 2023.1.0 hdb19cb5_46306 https://mirrors.tuna.tsinghua.edu.cn/anaconda/pkgs/main jinja2 3.1.3 pypi_0 pypi joblib 1.3.2 pypi_0 pypi krb5 1.20.1 h143b758_1 https://mirrors.tuna.tsinghua.edu.cn/anaconda/pkgs/main ld_impl_linux-64 2.38 h1181459_1 https://mirrors.tuna.tsinghua.edu.cn/anaconda/pkgs/main libcurl 8.5.0 h251f7ec_0 https://mirrors.tuna.tsinghua.edu.cn/anaconda/pkgs/main libedit 3.1.20230828 h5eee18b_0 https://mirrors.tuna.tsinghua.edu.cn/anaconda/pkgs/main libev 4.33 h7f8727e_1 https://mirrors.tuna.tsinghua.edu.cn/anaconda/pkgs/main libffi 3.4.4 h6a678d5_0 https://mirrors.tuna.tsinghua.edu.cn/anaconda/pkgs/main libgcc-ng 11.2.0 h1234567_1 https://mirrors.tuna.tsinghua.edu.cn/anaconda/pkgs/main libgomp 11.2.0 h1234567_1 https://mirrors.tuna.tsinghua.edu.cn/anaconda/pkgs/main libnghttp2 1.57.0 h2d74bed_0 https://mirrors.tuna.tsinghua.edu.cn/anaconda/pkgs/main libssh2 1.10.0 hdbd6064_2 https://mirrors.tuna.tsinghua.edu.cn/anaconda/pkgs/main libstdcxx-ng 11.2.0 h1234567_1 https://mirrors.tuna.tsinghua.edu.cn/anaconda/pkgs/main libuuid 1.41.5 h5eee18b_0 https://mirrors.tuna.tsinghua.edu.cn/anaconda/pkgs/main libuv 1.44.2 h5eee18b_0 https://mirrors.tuna.tsinghua.edu.cn/anaconda/pkgs/main lit 18.1.1 pypi_0 pypi lz4-c 1.9.4 h6a678d5_0 https://mirrors.tuna.tsinghua.edu.cn/anaconda/pkgs/main markdown-it-py 3.0.0 pypi_0 pypi markupsafe 2.1.5 pypi_0 pypi mdurl 0.1.2 pypi_0 pypi mkl 2023.1.0 h213fc3f_46344 https://mirrors.tuna.tsinghua.edu.cn/anaconda/pkgs/main mkl-service 2.4.0 py38h5eee18b_1 https://mirrors.tuna.tsinghua.edu.cn/anaconda/pkgs/main mkl_fft 1.3.8 py38h5eee18b_0 https://mirrors.tuna.tsinghua.edu.cn/anaconda/pkgs/main mkl_random 1.2.4 py38hdb19cb5_0 https://mirrors.tuna.tsinghua.edu.cn/anaconda/pkgs/main mpmath 1.3.0 pypi_0 pypi multidict 6.0.5 pypi_0 pypi multiprocess 0.70.16 pypi_0 pypi ncurses 6.4 h6a678d5_0 https://mirrors.tuna.tsinghua.edu.cn/anaconda/pkgs/main networkx 3.1 pypi_0 pypi ninja 1.11.1.1 pypi_0 pypi nltk 3.8.1 pypi_0 pypi numerize 0.12 pypi_0 pypi numpy 1.24.3 py38hf6e8229_1 https://mirrors.tuna.tsinghua.edu.cn/anaconda/pkgs/main numpy-base 1.24.3 py38h060ed82_1 https://mirrors.tuna.tsinghua.edu.cn/anaconda/pkgs/main nvidia-cublas-cu11 11.10.3.66 pypi_0 pypi nvidia-cuda-cupti-cu11 11.7.101 pypi_0 pypi nvidia-cuda-nvrtc-cu11 11.7.99 pypi_0 pypi nvidia-cuda-runtime-cu11 11.7.99 pypi_0 pypi nvidia-cudnn-cu11 8.5.0.96 pypi_0 pypi nvidia-cufft-cu11 10.9.0.58 pypi_0 pypi nvidia-curand-cu11 10.2.10.91 pypi_0 pypi nvidia-cusolver-cu11 11.4.0.1 pypi_0 pypi nvidia-cusparse-cu11 11.7.4.91 pypi_0 pypi nvidia-nccl-cu11 2.14.3 pypi_0 pypi nvidia-nvtx-cu11 11.7.91 pypi_0 pypi openssl 3.0.13 h7f8727e_0 https://mirrors.tuna.tsinghua.edu.cn/anaconda/pkgs/main packaging 24.0 pypi_0 pypi pandas 2.0.3 pypi_0 pypi peft 0.9.0 pypi_0 pypi pillow 10.2.0 pypi_0 pypi pip 23.3.1 py38h06a4308_0 https://mirrors.tuna.tsinghua.edu.cn/anaconda/pkgs/main protobuf 3.20.3 pypi_0 pypi psutil 5.9.8 pypi_0 pypi py-cpuinfo 9.0.0 pypi_0 pypi pyarrow 15.0.1 pypi_0 pypi pyarrow-hotfix 0.6 pypi_0 pypi pycparser 2.21 pyhd3eb1b0_0 https://mirrors.tuna.tsinghua.edu.cn/anaconda/pkgs/main pydantic 1.9.0 pypi_0 pypi pydantic-core 2.16.3 pypi_0 pypi pygments 2.17.2 pypi_0 pypi pynvml 11.5.0 pypi_0 pypi python 3.8.18 h955ad1f_0 https://mirrors.tuna.tsinghua.edu.cn/anaconda/pkgs/main python-dateutil 2.9.0.post0 pypi_0 pypi pytz 2024.1 pypi_0 pypi pyyaml 6.0.1 py38h5eee18b_0 https://mirrors.tuna.tsinghua.edu.cn/anaconda/pkgs/main readline 8.2 h5eee18b_0 https://mirrors.tuna.tsinghua.edu.cn/anaconda/pkgs/main regex 2023.12.25 pypi_0 pypi requests 2.31.0 pypi_0 pypi rhash 1.4.3 hdbd6064_0 https://mirrors.tuna.tsinghua.edu.cn/anaconda/pkgs/main rich 13.7.1 pypi_0 pypi rouge-score 0.1.2 pypi_0 pypi safetensors 0.4.2 pypi_0 pypi sentencepiece 0.2.0 pypi_0 pypi setuptools 68.2.2 py38h06a4308_0 https://mirrors.tuna.tsinghua.edu.cn/anaconda/pkgs/main six 1.16.0 pypi_0 pypi sqlite 3.41.2 h5eee18b_0 https://mirrors.tuna.tsinghua.edu.cn/anaconda/pkgs/main sympy 1.12 pypi_0 pypi tbb 2021.8.0 hdb19cb5_0 https://mirrors.tuna.tsinghua.edu.cn/anaconda/pkgs/main tk 8.6.12 h1ccaba5_0 https://mirrors.tuna.tsinghua.edu.cn/anaconda/pkgs/main tokenizers 0.15.2 pypi_0 pypi torch 2.0.1 pypi_0 pypi torchtyping 0.1.4 pypi_0 pypi torchvision 0.15.2 pypi_0 pypi tqdm 4.66.2 pypi_0 pypi transformers 4.36.0.dev0 pypi_0 pypi triton 2.0.0 pypi_0 pypi typeguard 4.1.5 pypi_0 pypi typing 3.10.0.0 py38h06a4308_0 https://mirrors.tuna.tsinghua.edu.cn/anaconda/pkgs/main typing-extensions 4.10.0 pypi_0 pypi tzdata 2024.1 pypi_0 pypi urllib3 2.2.1 pypi_0 pypi wheel 0.41.2 py38h06a4308_0 https://mirrors.tuna.tsinghua.edu.cn/anaconda/pkgs/main xxhash 3.4.1 pypi_0 pypi xz 5.4.6 h5eee18b_0 https://mirrors.tuna.tsinghua.edu.cn/anaconda/pkgs/main yaml 0.2.5 h7b6447c_0 https://mirrors.tuna.tsinghua.edu.cn/anaconda/pkgs/main yarl 1.9.4 pypi_0 pypi zipp 3.18.0 pypi_0 pypi zlib 1.2.13 h5eee18b_0 https://mirrors.tuna.tsinghua.edu.cn/anaconda/pkgs/main zstd 1.5.5 hc292b87_0 https://mirrors.tuna.tsinghua.edu.cn/anaconda/pkgs/main
why i have downloaded the package deepspeed, but it does not work
deepspeed
Is the environment distil activated with conda activate distil? Can you import deepspeed in the interactive environment after simply running python3?
conda activate distil
python3
When i run the following command, the problem with the title arises
the scripts file is following:
my anaconda3 environment
distil
is following:why i have downloaded the package
deepspeed
, but it does not work