hpcaitech / ColossalAI

Making large AI models cheaper, faster and more accessible
https://www.colossalai.org
Apache License 2.0
38.63k stars 4.33k forks source link

[DOC]: 环境安装失败 #6066

Open eccct opened 1 day ago

eccct commented 1 day ago

📚 The doc issue

Win11安装 Ubuntu24.04子系统 WSL2 按照网站指导https://colossalai.org/zh-Hans/docs/get_started/installation 具体按照步骤如下: export CUDA_INSTALL_DIR=/usr/local/cuda-12.1 export CUDA_HOME=/usr/local/cuda-12.1 export LD_LIBRARY_PATH=$CUDA_HOME"/lib64:$LD_LIBRARY_PATH" export PATH=$CUDA_HOME"/bin:$PATH"

conda create -n colo01 python=3.10 conda activate colo01 export PATH=~/miniconda3/envs/colo01/bin:$PATH

sudo apt update sudo apt install gcc-10 g++-10
sudo update-alternatives --install /usr/bin/gcc gcc /usr/bin/gcc-12 60 sudo update-alternatives --install /usr/bin/g++ g++ /usr/bin/g++-12 60 sudo update-alternatives --config gcc gcc --version

wget https://developer.download.nvidia.com/compute/cuda/12.1.0/local_installers/cuda_12.1.0_530.30.02_linux.run sudo sh cuda_12.1.0_530.30.02_linux.run 验证 CUDA 安装:nvidia-smi

conda install nvidia/label/cuda-12.1.0::cuda-toolkit conda install pytorch torchvision torchaudio pytorch-cuda=12.1 -c pytorch -c nvidia

git clone https://github.com/hpcaitech/ColossalAI.git cd ColossalAI

pip install -r requirements/requirements.txt CUDA_EXT=1 pip install .

安装相关的开发库 pip install transformers pip install xformers pip install datasets tensorboard

运行benchmark Step1: 切换目录 cd examples/language/llama/scripts/benchmark_7B 修改gemini.sh bash gemini.sh

执行后提示错误 [rank0]: ImportError: FlashAttention2 has been toggled on, but it cannot be used due to the following error: the package flash_attn seems to be not installed. Please refer to the documentation of https://huggingface.co/docs/transformers/perf_infer_gpu_one#flashattention-2 to install Flash Attention 2.

然后安装flashattention-2成功 pip install packaging pip install ninja ninja --version
echo $? conda install -c conda-channel attention2 pip install flash-attn --no-build-isolation

再次执行bash gemini.sh,还是有错误。麻烦根据上传的log文件给予解答,最好能够完善安装文档,谢谢! gcc_nvidia-smi_pytorch_python log.txt

Issues-translate-bot commented 1 day ago

Bot detected the issue body's language is not English, translate it automatically. 👯👭🏻🧑‍🤝‍🧑👫🧑🏿‍🤝‍🧑🏻👩🏾‍🤝‍👨🏿👬🏿


Title: [DOC]: Environment installation failed

eccct commented 1 day ago

再次运行benchmark,bash gemini.sh后系统长时间停顿在编译阶段。 bash gemini sh 使用colossalai check -i这个命令来检查目前环境里的版本兼容性以及CUDA Extension的状态。 colossalai check again 请帮忙分析一下原因,谢谢!

Edenzzzz commented 1 day ago

You should use BUILD_EXT=1 pip install . and see if that compiles.

eccct commented 10 hours ago

I tried to use BUILD_EXT=1 pip install . and it failed to build, please check the log files I uploaded. Thanks! pip install -r requirementsrequirements.txt BUILD_EXT=1 pip install.txt

wangbluo commented 8 hours ago

You should troubleshoot your issue from the following aspects (the provided log information is limited). First, check that there are no issues with your machine, for example, by running nvidia-smi to confirm the availability of the GPUs. Check environment variables such as CUDA_VISIBLE_DEVICES, and ensure that LD_LIBRARY_PATH and CUDA_HOME are pointing to the correct CUDA version.

wangbluo commented 8 hours ago

Oh, I got it, seems like it's keeping compiling the JIT kernel op, it really takes some time and you didn't finish the compiling.

eccct commented 7 hours ago

Yesterday I ran " instead of "CUDA_EXT=1 pip install .", it build successfully. Then I ran benchmark with "bash gemini.sh", it took long time without responding. I updated the ticket and Edenzzzz replied me to use BUILD_EXT=1 pip install . and see if that compiles. Then I ran "pip install -r requirementsrequirements" and returned successfully. I used "BUILD_EXT=1 pip install . " instead of "CUDA_EXT=1 pip install .", it failed to build. Please check two uploaded files. Thanks!

eccct commented 7 hours ago

(colo01) root@DESKTOP-5H0EB03:~/ColossalAI# echo $CUDA_HOME /usr/local/cuda-12.1

(colo01) root@DESKTOP-5H0EB03:~/ColossalAI# echo $PATH /root/miniconda3/envs/colo01/bin:/usr/local/cuda-12.1/bin:/root/miniconda3/envs/colo01/bin:/root/miniconda3/condabin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin:/usr/games:/usr/local/games:/usr/lib/wsl/lib:/mnt/c/Windows/system32:/mnt/c/Windows:/mnt/c/Windows/System32/Wbem:/mnt/c/Windows/System32/WindowsPowerShell/v1.0:/mnt/c/Windows/System32/OpenSSH:/mnt/c/Program Files (x86)/NVIDIA Corporation/PhysX/Common:/mnt/c/Program Files/NVIDIA Corporation/NVIDIA NvDLISR:/mnt/c/WINDOWS/system32:/mnt/c/WINDOWS:/mnt/c/WINDOWS/System32/Wbem:/mnt/c/WINDOWS/System32/WindowsPowerShell/v1.0:/mnt/c/WINDOWS/System32/OpenSSH:/mnt/c/Program Files/PuTTY:/mnt/c/Program Files/Git/cmd:/mnt/c/Users/eccct/anaconda3/Scripts:/mnt/c/Program Files/OpenSSH-Win64:/mnt/c/Program Files/nodejs:/mnt/c/Program Files/Microsoft SQL Server/130/Tools/Binn:/mnt/c/Program Files/Microsoft SQL Server/Client SDK/ODBC/170/Tools/Binn:/mnt/c/Program Files/Docker/Docker/resources/bin:/mnt/c/Users/eccct/AppData/Local/Microsoft/WindowsApps:/mnt/d/A:/mnt/c/Users/eccct/Anaconda3:/mnt/c/Users/eccct/Anaconda3/Library/mingw-w64/bin:/mnt/c/Users/eccct/Anaconda3/Library/usr/bin:/mnt/c/Users/eccct/Anaconda3/Library/bin:/mnt/c/Users/eccct/Anaconda3/Scripts:/mnt/c/Users/eccct/AppData/Roaming/Aria2/maria2c.exe:/mnt/d/Microsoft VS Code/bin:/mnt/c/Users/eccct/AppData/Roaming/npm:/mnt/c/Users/eccct/.dotnet/tools:/mnt/c/Users/eccct/AppData/Local/Programs/Azure Dev CLI:/mnt/c/Users/eccct/.cache/lm-studio/bin:/snap/bin

(colo01) root@DESKTOP-5H0EB03:~/ColossalAI# echo $LD_LIBRARY_PATH /usr/local/cuda-12.1/lib64:

eccct commented 6 hours ago

(colo01) root@DESKTOP-5H0EB03:~/ColossalAI/examples/language/llama/scripts/benchmark_7B# bash gemini.sh /root/miniconda3/envs/colo01/lib/python3.10/site-packages/colossalai-0.4.4-py3.10.egg/colossalai/pipeline/schedule/_utils.py:19: FutureWarning: torch.utils._pytree._register_pytree_node is deprecated. Please use torch.utils._pytree.register_pytree_node instead. _register_pytree_node(OrderedDict, _odict_flatten, _odict_unflatten) /root/miniconda3/envs/colo01/lib/python3.10/site-packages/torch/utils/_pytree.py:332: UserWarning: <class 'collections.OrderedDict'> is already registered as pytree node. Overwriting the previous registration. warnings.warn( /root/miniconda3/envs/colo01/lib/python3.10/site-packages/colossalai-0.4.4-py3.10.egg/colossalai/shardformer/layer/normalization.py:45: UserWarning: Please install apex from source (https://github.com/NVIDIA/apex) to use the fused RMSNorm kernel warnings.warn("Please install apex from source (https://github.com/NVIDIA/apex) to use the fused RMSNorm kernel") /root/miniconda3/envs/colo01/lib/python3.10/site-packages/colossalai-0.4.4-py3.10.egg/colossalai/legacy/registry/init.py:1: DeprecationWarning: TorchScript support for functional optimizers is deprecated and will be removed in a future PyTorch release. Consider using the torch.compile optimizer instead. import torch.distributed.optim as dist_optim /root/miniconda3/envs/colo01/lib/python3.10/site-packages/colossalai-0.4.4-py3.10.egg/colossalai/accelerator/cuda_accelerator.py:282: FutureWarning: torch.cuda.amp.autocast(args...) is deprecated. Please use torch.amp.autocast('cuda', args...) instead. return torch.cuda.amp.autocast(enabled=enabled, dtype=dtype, cache_enabled=cache_enabled) [09/23/24 11:49:40] INFO colossalai - colossalai - INFO: /root/miniconda3/envs/colo01/lib/python3.10/site-pa ckages/colossalai-0.4.4-py3.10.egg/colossalai/initi alize.py:75 launch INFO colossalai - colossalai - INFO: Distributed environment is initialized, world size: 1 WARNING colossalai - colossalai - WARNING: /root/miniconda3/envs/colo01/lib/python3.10/site-pa ckages/colossalai-0.4.4-py3.10.egg/colossalai/boost er/plugin/gemini_plugin.py:382 init WARNING colossalai - colossalai - WARNING: enable_async_reduce sets pin_memory=True to achieve best performance, which is not implicitly set. Model params: 1.19 B [extension] Compiling the JIT cpu_adam_x86 kernel during runtime now

eccct commented 6 hours ago

(colo01) root@DESKTOP-5H0EB03:~/ColossalAI/examples/language/llama/scripts/benchmark_7B# bash gemini.sh /root/miniconda3/envs/colo01/lib/python3.10/site-packages/colossalai-0.4.4-py3.10.egg/colossalai/pipeline/schedule/_utils.py:19: FutureWarning: torch.utils._pytree._register_pytree_node is deprecated. Please use torch.utils._pytree.register_pytree_node instead. _register_pytree_node(OrderedDict, _odict_flatten, _odict_unflatten) /root/miniconda3/envs/colo01/lib/python3.10/site-packages/torch/utils/_pytree.py:332: UserWarning: <class 'collections.OrderedDict'> is already registered as pytree node. Overwriting the previous registration. warnings.warn( /root/miniconda3/envs/colo01/lib/python3.10/site-packages/colossalai-0.4.4-py3.10.egg/colossalai/shardformer/layer/normalization.py:45: UserWarning: Please install apex from source (https://github.com/NVIDIA/apex) to use the fused RMSNorm kernel warnings.warn("Please install apex from source (https://github.com/NVIDIA/apex) to use the fused RMSNorm kernel") /root/miniconda3/envs/colo01/lib/python3.10/site-packages/colossalai-0.4.4-py3.10.egg/colossalai/legacy/registry/init.py:1: DeprecationWarning: TorchScript support for functional optimizers is deprecated and will be removed in a future PyTorch release. Consider using the torch.compile optimizer instead. import torch.distributed.optim as dist_optim /root/miniconda3/envs/colo01/lib/python3.10/site-packages/colossalai-0.4.4-py3.10.egg/colossalai/accelerator/cuda_accelerator.py:282: FutureWarning: torch.cuda.amp.autocast(args...) is deprecated. Please use torch.amp.autocast('cuda', args...) instead. return torch.cuda.amp.autocast(enabled=enabled, dtype=dtype, cache_enabled=cache_enabled) [09/23/24 11:49:40] INFO colossalai - colossalai - INFO: /root/miniconda3/envs/colo01/lib/python3.10/site-pa ckages/colossalai-0.4.4-py3.10.egg/colossalai/initi alize.py:75 launch INFO colossalai - colossalai - INFO: Distributed environment is initialized, world size: 1 WARNING colossalai - colossalai - WARNING: /root/miniconda3/envs/colo01/lib/python3.10/site-pa ckages/colossalai-0.4.4-py3.10.egg/colossalai/boost er/plugin/gemini_plugin.py:382 init WARNING colossalai - colossalai - WARNING: enable_async_reduce sets pin_memory=True to achieve best performance, which is not implicitly set. Model params: 1.19 B [extension] Compiling the JIT cpu_adam_x86 kernel during runtime now [extension] Time taken to compile cpu_adam_x86 op: 26.49249792098999 seconds [extension] Compiling the JIT fused_optim_cuda kernel during runtime now [extension] Time taken to compile fused_optim_cuda op: 57.339394330978394 seconds rank0: Traceback (most recent call last): rank0: File "/root/miniconda3/envs/colo01/lib/python3.10/site-packages/colossalai-0.4.4-py3.10.egg/colossalai/zero/gemini/chunk/manager.py", line 82, in register_tensor

rank0: File "/root/miniconda3/envs/colo01/lib/python3.10/site-packages/colossalai-0.4.4-py3.10.egg/colossalai/zero/gemini/chunk/chunk.py", line 277, in append_tensor rank0: raise ChunkFullError

rank0: During handling of the above exception, another exception occurred:

rank0: Traceback (most recent call last): rank0: File "/root/ColossalAI/examples/language/llama/benchmark.py", line 364, in

rank0: File "/root/ColossalAI/examples/language/llama/benchmark.py", line 308, in main rank0: model, optimizer, , dataloader, = booster.boost(model, optimizer, dataloader=dataloader) rank0: File "/root/miniconda3/envs/colo01/lib/python3.10/site-packages/colossalai-0.4.4-py3.10.egg/colossalai/booster/booster.py", line 154, in boost rank0: model, optimizer, criterion, dataloader, lr_scheduler = self.plugin.configure( rank0: File "/root/miniconda3/envs/colo01/lib/python3.10/site-packages/colossalai-0.4.4-py3.10.egg/colossalai/booster/plugin/gemini_plugin.py", line 571, in configure rank0: model = GeminiDDP( rank0: File "/root/miniconda3/envs/colo01/lib/python3.10/site-packages/colossalai-0.4.4-py3.10.egg/colossalai/zero/gemini/gemini_ddp.py", line 183, in init

rank0: File "/root/miniconda3/envs/colo01/lib/python3.10/site-packages/colossalai-0.4.4-py3.10.egg/colossalai/zero/gemini/gemini_ddp.py", line 882, in _init_chunks

rank0: File "/root/miniconda3/envs/colo01/lib/python3.10/site-packages/colossalai-0.4.4-py3.10.egg/colossalai/zero/gemini/chunk/manager.py", line 90, in register_tensor

rank0: File "/root/miniconda3/envs/colo01/lib/python3.10/site-packages/colossalai-0.4.4-py3.10.egg/colossalai/zero/gemini/chunk/manager.py", line 250, in __close_one_chunk

rank0: File "/root/miniconda3/envs/colo01/lib/python3.10/site-packages/colossalai-0.4.4-py3.10.egg/colossalai/zero/gemini/chunk/chunk.py", line 314, in close_chunk rank0: self.cpu_shard = torch.empty(self.shard_size, dtype=self.dtype, pin_memory=self.pin_memory) rank0: RuntimeError: CUDA error: out of memory rank0: CUDA kernel errors might be asynchronously reported at some other API call, so the stacktrace below might be incorrect. rank0: For debugging consider passing CUDA_LAUNCH_BLOCKING=1 rank0: Compile with TORCH_USE_CUDA_DSA to enable device-side assertions.

Exception ignored in: <function GeminiDDP.del at 0x7f2fc9e3f640> Traceback (most recent call last): File "/root/miniconda3/envs/colo01/lib/python3.10/site-packages/colossalai-0.4.4-py3.10.egg/colossalai/zero/gemini/gemini_ddp.py", line 222, in del self.remove_hooks() File "/root/miniconda3/envs/colo01/lib/python3.10/site-packages/colossalai-0.4.4-py3.10.egg/colossalai/zero/gemini/gemini_ddp.py", line 213, in remove_hooks for p in self.module.parameters(): File "/root/miniconda3/envs/colo01/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1729, in getattr raise AttributeError(f"'{type(self).name}' object has no attribute '{name}'") AttributeError: 'GeminiDDP' object has no attribute 'module' rank0:[W923 11:51:06.112981702 ProcessGroupNCCL.cpp:1168] Warning: WARNING: process group has NOT been destroyed before we destruct ProcessGroupNCCL. On normal program exit, the application should call destroy_process_group to ensure that any pending NCCL operations have finished in this process. In rare cases this process can exit before this point and block the progress of another member of the process group. This constraint has always been present, but this warning has only been added since PyTorch 2.4 (function operator()) E0923 11:51:08.022000 140421874321216 torch/distributed/elastic/multiprocessing/api.py:833] failed (exitcode: 1) local_rank: 0 (pid: 2515) of binary: /root/miniconda3/envs/colo01/bin/python Traceback (most recent call last): File "/root/miniconda3/envs/colo01/bin/torchrun", line 33, in sys.exit(load_entry_point('torch==2.4.0', 'console_scripts', 'torchrun')()) File "/root/miniconda3/envs/colo01/lib/python3.10/site-packages/torch/distributed/elastic/multiprocessing/errors/init.py", line 348, in wrapper return f(*args, **kwargs) File "/root/miniconda3/envs/colo01/lib/python3.10/site-packages/torch/distributed/run.py", line 901, in main run(args) File "/root/miniconda3/envs/colo01/lib/python3.10/site-packages/torch/distributed/run.py", line 892, in run elastic_launch( File "/root/miniconda3/envs/colo01/lib/python3.10/site-packages/torch/distributed/launcher/api.py", line 133, in call return launch_agent(self._config, self._entrypoint, list(args)) File "/root/miniconda3/envs/colo01/lib/python3.10/site-packages/torch/distributed/launcher/api.py", line 264, in launch_agent raise ChildFailedError( torch.distributed.elastic.multiprocessing.errors.ChildFailedError:

benchmark.py FAILED

Failures:

------------------------------------------------------------ Root Cause (first observed failure): [0]: time : 2024-09-23_11:51:07 host : DESKTOP-5H0EB03. rank : 0 (local_rank: 0) exitcode : 1 (pid: 2515) error_file: traceback : To enable traceback see: https://pytorch.org/docs/stable/elastic/errors.html ============================================================ Error: failed to run torchrun --nproc_per_node=1 --nnodes=1 --node_rank=0 --master_addr=127.0.0.1 --master_port=29500 benchmark.py -g -x -b 16 -c 1b -l 256 on 127.0.0.1, is localhost: True, exception: Encountered a bad command exit code! Command: 'cd /root/ColossalAI/examples/language/llama && export SHELL="/bin/bash" GCC_RANLIB="/root/miniconda3/envs/colo01/bin/x86_64-conda-linux-gnu-gcc-ranlib" WSL2_GUI_APPS_ENABLED="1" CONDA_EXE="/root/miniconda3/bin/conda" WSL_DISTRO_NAME="Ubuntu-24.04" build_alias="x86_64-conda-linux-gnu" CMAKE_ARGS="-DCMAKE_LINKER=/root/miniconda3/envs/colo01/bin/x86_64-conda-linux-gnu-ld -DCMAKE_STRIP=/root/miniconda3/envs/colo01/bin/x86_64-conda-linux-gnu-strip" GPROF="/root/miniconda3/envs/colo01/bin/x86_64-conda-linux-gnu-gprof" _CONDA_PYTHON_SYSCONFIGDATA_NAME="_sysconfigdata_x86_64_conda_cos7_linux_gnu" STRINGS="/root/miniconda3/envs/colo01/bin/x86_64-conda-linux-gnu-strings" CPP="/root/miniconda3/envs/colo01/bin/x86_64-conda-linux-gnu-cpp" NAME="DESKTOP-5H0EB03" PWD="/root/ColossalAI/examples/language/llama" GSETTINGS_SCHEMA_DIR="/root/miniconda3/envs/colo01/share/glib-2.0/schemas" LOGNAME="root" CONDA_PREFIX="/root/miniconda3/envs/colo01" CXX="/root/miniconda3/envs/colo01/bin/x86_64-conda-linux-gnu-c++" CXXFLAGS="-fvisibility-inlines-hidden -std=c++17 -fmessage-length=0 -march=nocona -mtune=haswell -ftree-vectorize -fPIC -fstack-protector-strong -fno-plt -O2 -ffunction-sections -pipe -isystem /root/miniconda3/envs/colo01/include" DEBUG_CXXFLAGS="-fvisibility-inlines-hidden -std=c++17 -fmessage-length=0 -march=nocona -mtune=haswell -ftree-vectorize -fPIC -fstack-protector-all -fno-plt -Og -g -Wall -Wextra -fvar-tracking-assignments -ffunction-sections -pipe -isystem /root/miniconda3/envs/colo01/include" MOTD_SHOWN="update-motd" LDFLAGS="-Wl,-O2 -Wl,--sort-common -Wl,--as-needed -Wl,-z,relro -Wl,-z,now -Wl,--disable-new-dtags -Wl,--gc-sections -Wl,-rpath,/root/miniconda3/envs/colo01/lib -Wl,-rpath-link,/root/miniconda3/envs/colo01/lib -L/root/miniconda3/envs/colo01/lib" HOME="/root" LANG="C.UTF-8" WSL_INTEROP="/run/WSL/318_interop" LS_COLORS="rs=0:di=01;34:ln=01;36:mh=00:pi=40;33:so=01;35:do=01;35:bd=40;33;01:cd=40;33;01:or=40;31;01:mi=00:su=37;41:sg=30;43:ca=00:tw=30;42:ow=34;42:st=37;44:ex=01;32:*.tar=01;31:*.tgz=01;31:*.arc=01;31:*.arj=01;31:*.taz=01;31:*.lha=01;31:*.lz4=01;31:*.lzh=01;31:*.lzma=01;31:*.tlz=01;31:*.txz=01;31:*.tzo=01;31:*.t7z=01;31:*.zip=01;31:*.z=01;31:*.dz=01;31:*.gz=01;31:*.lrz=01;31:*.lz=01;31:*.lzo=01;31:*.xz=01;31:*.zst=01;31:*.tzst=01;31:*.bz2=01;31:*.bz=01;31:*.tbz=01;31:*.tbz2=01;31:*.tz=01;31:*.deb=01;31:*.rpm=01;31:*.jar=01;31:*.war=01;31:*.ear=01;31:*.sar=01;31:*.rar=01;31:*.alz=01;31:*.ace=01;31:*.zoo=01;31:*.cpio=01;31:*.7z=01;31:*.rz=01;31:*.cab=01;31:*.wim=01;31:*.swm=01;31:*.dwm=01;31:*.esd=01;31:*.avif=01;35:*.jpg=01;35:*.jpeg=01;35:*.mjpg=01;35:*.mjpeg=01;35:*.gif=01;35:*.bmp=01;35:*.pbm=01;35:*.pgm=01;35:*.ppm=01;35:*.tga=01;35:*.xbm=01;35:*.xpm=01;35:*.tif=01;35:*.tiff=01;35:*.png=01;35:*.svg=01;35:*.svgz=01;35:*.mng=01;35:*.pcx=01;35:*.mov=01;35:*.mpg=01;35:*.mpeg=01;35:*.m2v=01;35:*.mkv=01;35:*.webm=01;35:*.webp=01;35:*.ogm=01;35:*.mp4=01;35:*.m4v=01;35:*.mp4v=01;35:*.vob=01;35:*.qt=01;35:*.nuv=01;35:*.wmv=01;35:*.asf=01;35:*.rm=01;35:*.rmvb=01;35:*.flc=01;35:*.avi=01;35:*.fli=01;35:*.flv=01;35:*.gl=01;35:*.dl=01;35:*.xcf=01;35:*.xwd=01;35:*.yuv=01;35:*.cgm=01;35:*.emf=01;35:*.ogv=01;35:*.ogx=01;35:*.aac=00;36:*.au=00;36:*.flac=00;36:*.m4a=00;36:*.mid=00;36:*.midi=00;36:*.mka=00;36:*.mp3=00;36:*.mpc=00;36:*.ogg=00;36:*.ra=00;36:*.wav=00;36:*.oga=00;36:*.opus=00;36:*.spx=00;36:*.xspf=00;36:*~=00;90:*#=00;90:*.bak=00;90:*.crdownload=00;90:*.dpkg-dist=00;90:*.dpkg-new=00;90:*.dpkg-old=00;90:*.dpkg-tmp=00;90:*.old=00;90:*.orig=00;90:*.part=00;90:*.rej=00;90:*.rpmnew=00;90:*.rpmorig=00;90:*.rpmsave=00;90:*.swp=00;90:*.tmp=00;90:*.ucf-dist=00;90:*.ucf-new=00;90:*.ucf-old=00;90:" DEBUG_CFLAGS="-march=nocona -mtune=haswell -ftree-vectorize -fPIC -fstack-protector-all -fno-plt -Og -g -Wall -Wextra -fvar-tracking-assignments -ffunction-sections -pipe -isystem /root/miniconda3/envs/colo01/include" WAYLAND_DISPLAY="wayland-0" CXX_FOR_BUILD="/root/miniconda3/envs/colo01/bin/x86_64-conda-linux-gnu-c++" ELFEDIT="/root/miniconda3/envs/colo01/bin/x86_64-conda-linux-gnu-elfedit" CONDA_PROMPT_MODIFIER="(colo01) " CMAKE_PREFIX_PATH="/root/miniconda3/envs/colo01:/root/miniconda3/envs/colo01/x86_64-conda-linux-gnu/sysroot/usr" CPPFLAGS="-DNDEBUG -D_FORTIFY_SOURCE=2 -O2 -isystem /root/miniconda3/envs/colo01/include" LD="/root/miniconda3/envs/colo01/bin/x86_64-conda-linux-gnu-ld" READELF="/root/miniconda3/envs/colo01/bin/x86_64-conda-linux-gnu-readelf" GXX="/root/miniconda3/envs/colo01/bin/x86_64-conda-linux-gnu-g++" GCC_AR="/root/miniconda3/envs/colo01/bin/x86_64-conda-linux-gnu-gcc-ar" LESSCLOSE="/usr/bin/lesspipe %s %s" ADDR2LINE="/root/miniconda3/envs/colo01/bin/x86_64-conda-linux-gnu-addr2line" TERM="xterm-256color" SIZE="/root/miniconda3/envs/colo01/bin/x86_64-conda-linux-gnu-size" GCC_NM="/root/miniconda3/envs/colo01/bin/x86_64-conda-linux-gnu-gcc-nm" HOST="x86_64-conda-linux-gnu" LESSOPEN="| /usr/bin/lesspipe %s" CC_FOR_BUILD="/root/miniconda3/envs/colo01/bin/x86_64-conda-linux-gnu-cc" USER="root" CONDA_SHLVL="2" AR="/root/miniconda3/envs/colo01/bin/x86_64-conda-linux-gnu-ar" AS="/root/miniconda3/envs/colo01/bin/x86_64-conda-linux-gnu-as" DEBUG_CPPFLAGS="-D_DEBUG -D_FORTIFY_SOURCE=2 -Og -isystem /root/miniconda3/envs/colo01/include" host_alias="x86_64-conda-linux-gnu" DISPLAY=":0" SHLVL="2" NM="/root/miniconda3/envs/colo01/bin/x86_64-conda-linux-gnu-nm" GCC="/root/miniconda3/envs/colo01/bin/x86_64-conda-linux-gnu-gcc" CUDA_INSTALL_DIR="/usr/local/cuda-12.1" LD_GOLD="/root/miniconda3/envs/colo01/bin/x86_64-conda-linux-gnu-ld.gold" CONDA_PYTHON_EXE="/root/miniconda3/bin/python" LD_LIBRARY_PATH="/usr/local/cuda-12.1/lib64:" XDG_RUNTIME_DIR="/run/user/0/" CONDA_DEFAULT_ENV="colo01" OBJCOPY="/root/miniconda3/envs/colo01/bin/x86_64-conda-linux-gnu-objcopy" OMP_NUM_THREADS="1" STRIP="/root/miniconda3/envs/colo01/bin/x86_64-conda-linux-gnu-strip" CUDA_HOME="/usr/local/cuda-12.1" XDG_DATA_DIRS="/usr/local/share:/usr/share:/var/lib/snapd/desktop" OBJDUMP="/root/miniconda3/envs/colo01/bin/x86_64-conda-linux-gnu-objdump" PATH="/root/miniconda3/envs/colo01/bin:/usr/local/cuda-12.1/bin:/root/miniconda3/envs/colo01/bin:/root/miniconda3/condabin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin:/usr/games:/usr/local/games:/usr/lib/wsl/lib:/mnt/c/Windows/system32:/mnt/c/Windows:/mnt/c/Windows/System32/Wbem:/mnt/c/Windows/System32/WindowsPowerShell/v1.0:/mnt/c/Windows/System32/OpenSSH:/mnt/c/Program Files (x86)/NVIDIA Corporation/PhysX/Common:/mnt/c/Program Files/NVIDIA Corporation/NVIDIA NvDLISR:/mnt/c/WINDOWS/system32:/mnt/c/WINDOWS:/mnt/c/WINDOWS/System32/Wbem:/mnt/c/WINDOWS/System32/WindowsPowerShell/v1.0:/mnt/c/WINDOWS/System32/OpenSSH:/mnt/c/Program Files/PuTTY:/mnt/c/Program Files/Git/cmd:/mnt/c/Users/eccct/anaconda3/Scripts:/mnt/c/Program Files/OpenSSH-Win64:/mnt/c/Program Files/nodejs:/mnt/c/Program Files/Microsoft SQL Server/130/Tools/Binn:/mnt/c/Program Files/Microsoft SQL Server/Client SDK/ODBC/170/Tools/Binn:/mnt/c/Program Files/Docker/Docker/resources/bin:/mnt/c/Users/eccct/AppData/Local/Microsoft/WindowsApps:/mnt/d/A:/mnt/c/Users/eccct/Anaconda3:/mnt/c/Users/eccct/Anaconda3/Library/mingw-w64/bin:/mnt/c/Users/eccct/Anaconda3/Library/usr/bin:/mnt/c/Users/eccct/Anaconda3/Library/bin:/mnt/c/Users/eccct/Anaconda3/Scripts:/mnt/c/Users/eccct/AppData/Roaming/Aria2/maria2c.exe:/mnt/d/Microsoft VS Code/bin:/mnt/c/Users/eccct/AppData/Roaming/npm:/mnt/c/Users/eccct/.dotnet/tools:/mnt/c/Users/eccct/AppData/Local/Programs/Azure Dev CLI:/mnt/c/Users/eccct/.cache/lm-studio/bin:/snap/bin" CC="/root/miniconda3/envs/colo01/bin/x86_64-conda-linux-gnu-cc" CFLAGS="-march=nocona -mtune=haswell -ftree-vectorize -fPIC -fstack-protector-strong -fno-plt -O2 -ffunction-sections -pipe -isystem /root/miniconda3/envs/colo01/include" CXXFILT="/root/miniconda3/envs/colo01/bin/x86_64-conda-linux-gnu-c++filt" DBUS_SESSION_BUS_ADDRESS="unix:path=/run/user/0/bus" BUILD="x86_64-conda-linux-gnu" HOSTTYPE="x86_64" CONDA_PREFIX_1="/root/miniconda3" PULSE_SERVER="unix:/mnt/wslg/PulseServer" RANLIB="/root/miniconda3/envs/colo01/bin/x86_64-conda-linux-gnu-ranlib" CONDA_BUILD_SYSROOT="/root/miniconda3/envs/colo01/x86_64-conda-linux-gnu/sysroot" OLDPWD="/root/ColossalAI/examples/language/llama/scripts/benchmark_7B" _="/root/miniconda3/envs/colo01/bin/colossalai" CUDA_DEVICE_MAX_CONNECTIONS="1" && torchrun --nproc_per_node=1 --nnodes=1 --node_rank=0 --master_addr=127.0.0.1 --master_port=29500 benchmark.py -g -x -b 16 -c 1b -l 256' Exit code: 1
flybird11111 commented 3 hours ago

Hi, can you try out GCC (Ubuntu 9.4.0-1ubuntu1~20.04.2) version 9.4.0?

eccct commented 54 minutes ago

@flybird11111 (colo01) root@DESKTOP-5H0EB03:~/ColossalAI/examples/language/llama/scripts/benchmark_7B# gcc --version gcc (Ubuntu 12.3.0-17ubuntu1) 12.3.0 Copyright (C) 2022 Free Software Foundation, Inc. This is free software; see the source for copying conditions. There is NO warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.

Do you mean to degrade gcc 12.3 to 9.4?

flybird11111 commented 52 minutes ago

@flybird11111 (colo01) root@DESKTOP-5H0EB03:~/ColossalAI/examples/language/llama/scripts/benchmark_7B# gcc --version gcc (Ubuntu 12.3.0-17ubuntu1) 12.3.0 Copyright (C) 2022 Free Software Foundation, Inc. This is free software; see the source for copying conditions. There is NO warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.

Do you mean to degrade gcc 12.3 to 9.4?

yes.

eccct commented 36 minutes ago

@flybird11111 (colo01) root@DESKTOP-5H0EB03:~/ColossalAI/examples/language/llama/scripts/benchmark_7B# sudo update-alternatives --config gcc There are 3 choices for the alternative gcc (providing /usr/bin/gcc).

Selection Path Priority Status

0 /usr/bin/gcc-12 60 auto mode 1 /usr/bin/gcc-10 60 manual mode

Press to keep the current choice[*], or type selection number: 3 update-alternatives: using /usr/bin/gcc-9 to provide /usr/bin/gcc (gcc) in manual mode

@flybird11111 (colo01) root@DESKTOP-5H0EB03:~/ColossalAI/examples/language/llama/scripts/benchmark_7B# gcc --version gcc (Ubuntu 9.5.0-6ubuntu2) 9.5.0 Copyright (C) 2019 Free Software Foundation, Inc. This is free software; see the source for copying conditions. There is NO warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.


(colo01) root@DESKTOP-5H0EB03:~/ColossalAI/examples/language/llama/scripts/benchmark_7B# bash gemini.sh /root/miniconda3/envs/colo01/lib/python3.10/site-packages/colossalai-0.4.4-py3.10.egg/colossalai/pipeline/schedule/_utils.py:19: FutureWarning: torch.utils._pytree._register_pytree_node is deprecated. Please use torch.utils._pytree.register_pytree_node instead. _register_pytree_node(OrderedDict, _odict_flatten, _odict_unflatten) /root/miniconda3/envs/colo01/lib/python3.10/site-packages/torch/utils/_pytree.py:332: UserWarning: <class 'collections.OrderedDict'> is already registered as pytree node. Overwriting the previous registration. warnings.warn( /root/miniconda3/envs/colo01/lib/python3.10/site-packages/colossalai-0.4.4-py3.10.egg/colossalai/shardformer/layer/normalization.py:45: UserWarning: Please install apex from source (https://github.com/NVIDIA/apex) to use the fused RMSNorm kernel warnings.warn("Please install apex from source (https://github.com/NVIDIA/apex) to use the fused RMSNorm kernel") /root/miniconda3/envs/colo01/lib/python3.10/site-packages/colossalai-0.4.4-py3.10.egg/colossalai/legacy/registry/init.py:1: DeprecationWarning: TorchScript support for functional optimizers is deprecated and will be removed in a future PyTorch release. Consider using the torch.compile optimizer instead. import torch.distributed.optim as dist_optim /root/miniconda3/envs/colo01/lib/python3.10/site-packages/colossalai-0.4.4-py3.10.egg/colossalai/accelerator/cuda_accelerator.py:282: FutureWarning: torch.cuda.amp.autocast(args...) is deprecated. Please use torch.amp.autocast('cuda', args...) instead. return torch.cuda.amp.autocast(enabled=enabled, dtype=dtype, cache_enabled=cache_enabled) [09/23/24 17:52:56] INFO colossalai - colossalai - INFO: /root/miniconda3/envs/colo01/lib/python3.10/site-pa ckages/colossalai-0.4.4-py3.10.egg/colossalai/initi alize.py:75 launch INFO colossalai - colossalai - INFO: Distributed environment is initialized, world size: 1 WARNING colossalai - colossalai - WARNING: /root/miniconda3/envs/colo01/lib/python3.10/site-pa ckages/colossalai-0.4.4-py3.10.egg/colossalai/boost er/plugin/gemini_plugin.py:382 init WARNING colossalai - colossalai - WARNING: enable_async_reduce sets pin_memory=True to achieve best performance, which is not implicitly set. Model params: 615.01 M [extension] Compiling the JIT cpu_adam_x86 kernel during runtime now [extension] Time taken to compile cpu_adam_x86 op: 0.08598995208740234 seconds [extension] Compiling the JIT fused_optim_cuda kernel during runtime now [extension] Time taken to compile fused_optim_cuda op: 0.09327149391174316 seconds rank0: Traceback (most recent call last): rank0: File "/root/miniconda3/envs/colo01/lib/python3.10/site-packages/colossalai-0.4.4-py3.10.egg/colossalai/zero/gemini/chunk/manager.py", line 82, in register_tensor

rank0: File "/root/miniconda3/envs/colo01/lib/python3.10/site-packages/colossalai-0.4.4-py3.10.egg/colossalai/zero/gemini/chunk/chunk.py", line 277, in append_tensor rank0: raise ChunkFullError

rank0: During handling of the above exception, another exception occurred:

rank0: Traceback (most recent call last): rank0: File "/root/ColossalAI/examples/language/llama/benchmark.py", line 364, in

rank0: File "/root/ColossalAI/examples/language/llama/benchmark.py", line 308, in main rank0: model, optimizer, , dataloader, = booster.boost(model, optimizer, dataloader=dataloader) rank0: File "/root/miniconda3/envs/colo01/lib/python3.10/site-packages/colossalai-0.4.4-py3.10.egg/colossalai/booster/booster.py", line 154, in boost rank0: model, optimizer, criterion, dataloader, lr_scheduler = self.plugin.configure( rank0: File "/root/miniconda3/envs/colo01/lib/python3.10/site-packages/colossalai-0.4.4-py3.10.egg/colossalai/booster/plugin/gemini_plugin.py", line 571, in configure rank0: model = GeminiDDP( rank0: File "/root/miniconda3/envs/colo01/lib/python3.10/site-packages/colossalai-0.4.4-py3.10.egg/colossalai/zero/gemini/gemini_ddp.py", line 183, in init

rank0: File "/root/miniconda3/envs/colo01/lib/python3.10/site-packages/colossalai-0.4.4-py3.10.egg/colossalai/zero/gemini/gemini_ddp.py", line 882, in _init_chunks

rank0: File "/root/miniconda3/envs/colo01/lib/python3.10/site-packages/colossalai-0.4.4-py3.10.egg/colossalai/zero/gemini/chunk/manager.py", line 90, in register_tensor

rank0: File "/root/miniconda3/envs/colo01/lib/python3.10/site-packages/colossalai-0.4.4-py3.10.egg/colossalai/zero/gemini/chunk/manager.py", line 250, in __close_one_chunk

rank0: File "/root/miniconda3/envs/colo01/lib/python3.10/site-packages/colossalai-0.4.4-py3.10.egg/colossalai/zero/gemini/chunk/chunk.py", line 314, in close_chunk rank0: self.cpu_shard = torch.empty(self.shard_size, dtype=self.dtype, pin_memory=self.pin_memory) rank0: RuntimeError: CUDA error: out of memory rank0: Compile with TORCH_USE_CUDA_DSA to enable device-side assertions.

Exception ignored in: <function GeminiDDP.del at 0x7ff98200b640> Traceback (most recent call last): File "/root/miniconda3/envs/colo01/lib/python3.10/site-packages/colossalai-0.4.4-py3.10.egg/colossalai/zero/gemini/gemini_ddp.py", line 222, in del self.remove_hooks() File "/root/miniconda3/envs/colo01/lib/python3.10/site-packages/colossalai-0.4.4-py3.10.egg/colossalai/zero/gemini/gemini_ddp.py", line 213, in remove_hooks for p in self.module.parameters(): File "/root/miniconda3/envs/colo01/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1729, in getattr raise AttributeError(f"'{type(self).name}' object has no attribute '{name}'") AttributeError: 'GeminiDDP' object has no attribute 'module' rank0:[W923 17:52:58.822163966 ProcessGroupNCCL.cpp:1168] Warning: WARNING: process group has NOT been destroyed before we destruct ProcessGroupNCCL. On normal program exit, the application should call destroy_process_group to ensure that any pending NCCL operations have finished in this process. In rare cases this process can exit before this point and block the progress of another member of the process group. This constraint has always been present, but this warning has only been added since PyTorch 2.4 (function operator()) E0923 17:52:59.349000 140205836552000 torch/distributed/elastic/multiprocessing/api.py:833] failed (exitcode: 1) local_rank: 0 (pid: 1792) of binary: /root/miniconda3/envs/colo01/bin/python Traceback (most recent call last): File "/root/miniconda3/envs/colo01/bin/torchrun", line 33, in sys.exit(load_entry_point('torch==2.4.0', 'console_scripts', 'torchrun')()) File "/root/miniconda3/envs/colo01/lib/python3.10/site-packages/torch/distributed/elastic/multiprocessing/errors/init.py", line 348, in wrapper return f(*args, **kwargs) File "/root/miniconda3/envs/colo01/lib/python3.10/site-packages/torch/distributed/run.py", line 901, in main run(args) File "/root/miniconda3/envs/colo01/lib/python3.10/site-packages/torch/distributed/run.py", line 892, in run elastic_launch( File "/root/miniconda3/envs/colo01/lib/python3.10/site-packages/torch/distributed/launcher/api.py", line 133, in call return launch_agent(self._config, self._entrypoint, list(args)) File "/root/miniconda3/envs/colo01/lib/python3.10/site-packages/torch/distributed/launcher/api.py", line 264, in launch_agent raise ChildFailedError( torch.distributed.elastic.multiprocessing.errors.ChildFailedError:

benchmark.py FAILED

Failures:

------------------------------------------------------------ Root Cause (first observed failure): [0]: time : 2024-09-23_17:52:59 host : DESKTOP-5H0EB03. rank : 0 (local_rank: 0) exitcode : 1 (pid: 1792) error_file: traceback : To enable traceback see: https://pytorch.org/docs/stable/elastic/errors.html ============================================================ Error: failed to run torchrun --nproc_per_node=1 --nnodes=1 --node_rank=0 --master_addr=127.0.0.1 --master_port=29500 benchmark.py -g -x -b 16 -c 1b -l 64 on 127.0.0.1, is localhost: True, exception: Encountered a bad command exit code! Command: 'cd /root/ColossalAI/examples/language/llama && export SHELL="/bin/bash" GCC_RANLIB="/root/miniconda3/envs/colo01/bin/x86_64-conda-linux-gnu-gcc-ranlib" WSL2_GUI_APPS_ENABLED="1" CONDA_EXE="/root/miniconda3/bin/conda" WSL_DISTRO_NAME="Ubuntu-24.04" build_alias="x86_64-conda-linux-gnu" CMAKE_ARGS="-DCMAKE_LINKER=/root/miniconda3/envs/colo01/bin/x86_64-conda-linux-gnu-ld -DCMAKE_STRIP=/root/miniconda3/envs/colo01/bin/x86_64-conda-linux-gnu-strip" GPROF="/root/miniconda3/envs/colo01/bin/x86_64-conda-linux-gnu-gprof" _CONDA_PYTHON_SYSCONFIGDATA_NAME="_sysconfigdata_x86_64_conda_cos7_linux_gnu" STRINGS="/root/miniconda3/envs/colo01/bin/x86_64-conda-linux-gnu-strings" CPP="/root/miniconda3/envs/colo01/bin/x86_64-conda-linux-gnu-cpp" NAME="DESKTOP-5H0EB03" PWD="/root/ColossalAI/examples/language/llama" GSETTINGS_SCHEMA_DIR="/root/miniconda3/envs/colo01/share/glib-2.0/schemas" LOGNAME="root" CONDA_PREFIX="/root/miniconda3/envs/colo01" CXX="/root/miniconda3/envs/colo01/bin/x86_64-conda-linux-gnu-c++" CXXFLAGS="-fvisibility-inlines-hidden -std=c++17 -fmessage-length=0 -march=nocona -mtune=haswell -ftree-vectorize -fPIC -fstack-protector-strong -fno-plt -O2 -ffunction-sections -pipe -isystem /root/miniconda3/envs/colo01/include" DEBUG_CXXFLAGS="-fvisibility-inlines-hidden -std=c++17 -fmessage-length=0 -march=nocona -mtune=haswell -ftree-vectorize -fPIC -fstack-protector-all -fno-plt -Og -g -Wall -Wextra -fvar-tracking-assignments -ffunction-sections -pipe -isystem /root/miniconda3/envs/colo01/include" LDFLAGS="-Wl,-O2 -Wl,--sort-common -Wl,--as-needed -Wl,-z,relro -Wl,-z,now -Wl,--disable-new-dtags -Wl,--gc-sections -Wl,-rpath,/root/miniconda3/envs/colo01/lib -Wl,-rpath-link,/root/miniconda3/envs/colo01/lib -L/root/miniconda3/envs/colo01/lib" HOME="/root" LANG="C.UTF-8" WSL_INTEROP="/run/WSL/1425_interop" LS_COLORS="rs=0:di=01;34:ln=01;36:mh=00:pi=40;33:so=01;35:do=01;35:bd=40;33;01:cd=40;33;01:or=40;31;01:mi=00:su=37;41:sg=30;43:ca=00:tw=30;42:ow=34;42:st=37;44:ex=01;32:*.tar=01;31:*.tgz=01;31:*.arc=01;31:*.arj=01;31:*.taz=01;31:*.lha=01;31:*.lz4=01;31:*.lzh=01;31:*.lzma=01;31:*.tlz=01;31:*.txz=01;31:*.tzo=01;31:*.t7z=01;31:*.zip=01;31:*.z=01;31:*.dz=01;31:*.gz=01;31:*.lrz=01;31:*.lz=01;31:*.lzo=01;31:*.xz=01;31:*.zst=01;31:*.tzst=01;31:*.bz2=01;31:*.bz=01;31:*.tbz=01;31:*.tbz2=01;31:*.tz=01;31:*.deb=01;31:*.rpm=01;31:*.jar=01;31:*.war=01;31:*.ear=01;31:*.sar=01;31:*.rar=01;31:*.alz=01;31:*.ace=01;31:*.zoo=01;31:*.cpio=01;31:*.7z=01;31:*.rz=01;31:*.cab=01;31:*.wim=01;31:*.swm=01;31:*.dwm=01;31:*.esd=01;31:*.avif=01;35:*.jpg=01;35:*.jpeg=01;35:*.mjpg=01;35:*.mjpeg=01;35:*.gif=01;35:*.bmp=01;35:*.pbm=01;35:*.pgm=01;35:*.ppm=01;35:*.tga=01;35:*.xbm=01;35:*.xpm=01;35:*.tif=01;35:*.tiff=01;35:*.png=01;35:*.svg=01;35:*.svgz=01;35:*.mng=01;35:*.pcx=01;35:*.mov=01;35:*.mpg=01;35:*.mpeg=01;35:*.m2v=01;35:*.mkv=01;35:*.webm=01;35:*.webp=01;35:*.ogm=01;35:*.mp4=01;35:*.m4v=01;35:*.mp4v=01;35:*.vob=01;35:*.qt=01;35:*.nuv=01;35:*.wmv=01;35:*.asf=01;35:*.rm=01;35:*.rmvb=01;35:*.flc=01;35:*.avi=01;35:*.fli=01;35:*.flv=01;35:*.gl=01;35:*.dl=01;35:*.xcf=01;35:*.xwd=01;35:*.yuv=01;35:*.cgm=01;35:*.emf=01;35:*.ogv=01;35:*.ogx=01;35:*.aac=00;36:*.au=00;36:*.flac=00;36:*.m4a=00;36:*.mid=00;36:*.midi=00;36:*.mka=00;36:*.mp3=00;36:*.mpc=00;36:*.ogg=00;36:*.ra=00;36:*.wav=00;36:*.oga=00;36:*.opus=00;36:*.spx=00;36:*.xspf=00;36:*~=00;90:*#=00;90:*.bak=00;90:*.crdownload=00;90:*.dpkg-dist=00;90:*.dpkg-new=00;90:*.dpkg-old=00;90:*.dpkg-tmp=00;90:*.old=00;90:*.orig=00;90:*.part=00;90:*.rej=00;90:*.rpmnew=00;90:*.rpmorig=00;90:*.rpmsave=00;90:*.swp=00;90:*.tmp=00;90:*.ucf-dist=00;90:*.ucf-new=00;90:*.ucf-old=00;90:" DEBUG_CFLAGS="-march=nocona -mtune=haswell -ftree-vectorize -fPIC -fstack-protector-all -fno-plt -Og -g -Wall -Wextra -fvar-tracking-assignments -ffunction-sections -pipe -isystem /root/miniconda3/envs/colo01/include" WAYLAND_DISPLAY="wayland-0" CXX_FOR_BUILD="/root/miniconda3/envs/colo01/bin/x86_64-conda-linux-gnu-c++" CUDA_LAUNCH_BLOCKING="1" ELFEDIT="/root/miniconda3/envs/colo01/bin/x86_64-conda-linux-gnu-elfedit" CONDA_PROMPT_MODIFIER="(colo01) " CMAKE_PREFIX_PATH="/root/miniconda3/envs/colo01:/root/miniconda3/envs/colo01/x86_64-conda-linux-gnu/sysroot/usr" CPPFLAGS="-DNDEBUG -D_FORTIFY_SOURCE=2 -O2 -isystem /root/miniconda3/envs/colo01/include" LD="/root/miniconda3/envs/colo01/bin/x86_64-conda-linux-gnu-ld" READELF="/root/miniconda3/envs/colo01/bin/x86_64-conda-linux-gnu-readelf" GXX="/root/miniconda3/envs/colo01/bin/x86_64-conda-linux-gnu-g++" CUDA_VISIBLE_DEVICES="0" GCC_AR="/root/miniconda3/envs/colo01/bin/x86_64-conda-linux-gnu-gcc-ar" LESSCLOSE="/usr/bin/lesspipe %s %s" ADDR2LINE="/root/miniconda3/envs/colo01/bin/x86_64-conda-linux-gnu-addr2line" TERM="xterm-256color" SIZE="/root/miniconda3/envs/colo01/bin/x86_64-conda-linux-gnu-size" GCC_NM="/root/miniconda3/envs/colo01/bin/x86_64-conda-linux-gnu-gcc-nm" HOST="x86_64-conda-linux-gnu" LESSOPEN="| /usr/bin/lesspipe %s" CC_FOR_BUILD="/root/miniconda3/envs/colo01/bin/x86_64-conda-linux-gnu-cc" USER="root" CONDA_SHLVL="2" AR="/root/miniconda3/envs/colo01/bin/x86_64-conda-linux-gnu-ar" AS="/root/miniconda3/envs/colo01/bin/x86_64-conda-linux-gnu-as" DEBUG_CPPFLAGS="-D_DEBUG -D_FORTIFY_SOURCE=2 -Og -isystem /root/miniconda3/envs/colo01/include" host_alias="x86_64-conda-linux-gnu" DISPLAY=":0" SHLVL="2" NM="/root/miniconda3/envs/colo01/bin/x86_64-conda-linux-gnu-nm" GCC="/root/miniconda3/envs/colo01/bin/x86_64-conda-linux-gnu-gcc" CUDA_INSTALL_DIR="/usr/local/cuda-12.1" LD_GOLD="/root/miniconda3/envs/colo01/bin/x86_64-conda-linux-gnu-ld.gold" CONDA_PYTHON_EXE="/root/miniconda3/bin/python" LD_LIBRARY_PATH="/usr/local/cuda-12.1/lib64:" XDG_RUNTIME_DIR="/run/user/0/" CONDA_DEFAULT_ENV="colo01" OBJCOPY="/root/miniconda3/envs/colo01/bin/x86_64-conda-linux-gnu-objcopy" OMP_NUM_THREADS="1" STRIP="/root/miniconda3/envs/colo01/bin/x86_64-conda-linux-gnu-strip" CUDA_HOME="/usr/local/cuda-12.1" XDG_DATA_DIRS="/usr/local/share:/usr/share:/var/lib/snapd/desktop" OBJDUMP="/root/miniconda3/envs/colo01/bin/x86_64-conda-linux-gnu-objdump" PATH="/root/miniconda3/envs/colo01/bin:/usr/local/cuda-12.1/bin:/root/miniconda3/envs/colo01/bin:/root/miniconda3/condabin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin:/usr/games:/usr/local/games:/usr/lib/wsl/lib:/mnt/c/Windows/system32:/mnt/c/Windows:/mnt/c/Windows/System32/Wbem:/mnt/c/Windows/System32/WindowsPowerShell/v1.0:/mnt/c/Windows/System32/OpenSSH:/mnt/c/Program Files (x86)/NVIDIA Corporation/PhysX/Common:/mnt/c/Program Files/NVIDIA Corporation/NVIDIA NvDLISR:/mnt/c/WINDOWS/system32:/mnt/c/WINDOWS:/mnt/c/WINDOWS/System32/Wbem:/mnt/c/WINDOWS/System32/WindowsPowerShell/v1.0:/mnt/c/WINDOWS/System32/OpenSSH:/mnt/c/Program Files/PuTTY:/mnt/c/Program Files/Git/cmd:/mnt/c/Users/eccct/anaconda3/Scripts:/mnt/c/Program Files/OpenSSH-Win64:/mnt/c/Program Files/nodejs:/mnt/c/Program Files/Microsoft SQL Server/130/Tools/Binn:/mnt/c/Program Files/Microsoft SQL Server/Client SDK/ODBC/170/Tools/Binn:/mnt/c/Program Files/Docker/Docker/resources/bin:/mnt/c/Users/eccct/AppData/Local/Microsoft/WindowsApps:/mnt/d/A:/mnt/c/Users/eccct/Anaconda3:/mnt/c/Users/eccct/Anaconda3/Library/mingw-w64/bin:/mnt/c/Users/eccct/Anaconda3/Library/usr/bin:/mnt/c/Users/eccct/Anaconda3/Library/bin:/mnt/c/Users/eccct/Anaconda3/Scripts:/mnt/c/Users/eccct/AppData/Roaming/Aria2/maria2c.exe:/mnt/d/Microsoft VS Code/bin:/mnt/c/Users/eccct/AppData/Roaming/npm:/mnt/c/Users/eccct/.dotnet/tools:/mnt/c/Users/eccct/AppData/Local/Programs/Azure Dev CLI:/mnt/c/Users/eccct/.cache/lm-studio/bin:/snap/bin" CC="/root/miniconda3/envs/colo01/bin/x86_64-conda-linux-gnu-cc" CFLAGS="-march=nocona -mtune=haswell -ftree-vectorize -fPIC -fstack-protector-strong -fno-plt -O2 -ffunction-sections -pipe -isystem /root/miniconda3/envs/colo01/include" CXXFILT="/root/miniconda3/envs/colo01/bin/x86_64-conda-linux-gnu-c++filt" DBUS_SESSION_BUS_ADDRESS="unix:path=/run/user/0/bus" BUILD="x86_64-conda-linux-gnu" HOSTTYPE="x86_64" CONDA_PREFIX_1="/root/miniconda3" PULSE_SERVER="unix:/mnt/wslg/PulseServer" RANLIB="/root/miniconda3/envs/colo01/bin/x86_64-conda-linux-gnu-ranlib" CONDA_BUILD_SYSROOT="/root/miniconda3/envs/colo01/x86_64-conda-linux-gnu/sysroot" OLDPWD="/root/ColossalAI/examples/language/llama/scripts/benchmark_7B" _="/root/miniconda3/envs/colo01/bin/colossalai" CUDA_DEVICE_MAX_CONNECTIONS="1" && torchrun --nproc_per_node=1 --nnodes=1 --node_rank=0 --master_addr=127.0.0.1 --master_port=29500 benchmark.py -g -x -b 16 -c 1b -l 64' Exit code: 1
flybird11111 commented 28 minutes ago

[rank0]: RuntimeError: CUDA error: out of memory, It seems that the issue is due to insufficient memory.