Open affromero opened 4 months ago
Same here, I was using a single 3090 GPU, and met the OOM error with bf16
and the AttributeError: 'LlamaAttention' object has no attribute 'rope_theta'
with fp16
.
Edit: After monitoring the RAM usage, I found that this model may need more than 25GB of GPU memory. Note that I did not load any images or run a forward pass. Therefore, we may need a GPU with more than 40GB of memory.
I ran this model (with bf16) successfully on an L20 GPU (48GB), but it is really at risk of running out of memory (OOM).
For fp16, it used about 26GB, which is why we got OOM with a 3090 or any GPU with 24GB.
For RAM: the model will use about 26GB, so I think >= 32GB RAM should be a good choice.
I ran this model (with bf16) successfully on an L20 GPU (48GB), but it is really at risk of running out of memory (OOM).
For fp16, it used about 26GB, which is why we got OOM with a 3090 or any GPU with 24GB.
For RAM: the model will use about 26GB, so I think >= 32GB RAM should be a good choice.
Hi, would you mind share the way how you fix that 'rope_theta' issue? I'm using a 32GB v100 , it's not enough for bf16, so I tried the fp16 format, (though I dont know why fp16 can reduce memory cost significantly, according to your post.), but met the problem above. It looks like relate to deepspeed, which I'm using: deepspped==0.14.4 transformers==4.31.0
I ran this model (with bf16) successfully on an L20 GPU (48GB), but it is really at risk of running out of memory (OOM).
For fp16, it used about 26GB, which is why we got OOM with a 3090 or any GPU with 24GB. For RAM: the model will use about 26GB, so I think >= 32GB RAM should be a good choice.
Hi, would you mind share the way how you fix that 'rope_theta' issue? I'm using a 32GB v100 , it's not enough for bf16, so I tried the fp16 format, (though I dont know why fp16 can reduce memory cost significantly, according to your post.), but met the problem above. It looks like relate to deepspeed, which I'm using: deepspped==0.14.4 transformers==4.31.0
Hi,
the deepspeed
version I am using is 0.6.5
.
And below are my conda env list
logs for your convenience:
# packages in environment at /root/miniconda3/envs/chatpose:
#
# Name Version Build Channel
_libgcc_mutex 0.1 main https://mirrors.tuna.tsinghua.edu.cn/anaconda/pkgs/main
_openmp_mutex 5.1 1_gnu https://mirrors.tuna.tsinghua.edu.cn/anaconda/pkgs/main
accelerate 0.31.0 pypi_0 pypi
aiofiles 23.2.1 pypi_0 pypi
aiohttp 3.9.5 pypi_0 pypi
aiosignal 1.3.1 pypi_0 pypi
altair 5.3.0 pypi_0 pypi
annotated-types 0.7.0 pypi_0 pypi
anyio 4.4.0 pypi_0 pypi
async-timeout 4.0.3 pypi_0 pypi
attrs 23.2.0 pypi_0 pypi
bitsandbytes 0.41.1 pypi_0 pypi
ca-certificates 2024.3.11 h06a4308_0 https://mirrors.tuna.tsinghua.edu.cn/anaconda/pkgs/main
certifi 2024.6.2 pypi_0 pypi
charset-normalizer 3.3.2 pypi_0 pypi
click 8.1.7 pypi_0 pypi
contourpy 1.2.1 pypi_0 pypi
cycler 0.12.1 pypi_0 pypi
deepspeed 0.6.5 pypi_0 pypi
einops 0.4.1 pypi_0 pypi
exceptiongroup 1.2.1 pypi_0 pypi
fastapi 0.100.1 pypi_0 pypi
ffmpy 0.3.2 pypi_0 pypi
filelock 3.15.4 pypi_0 pypi
flash-attn 2.5.9.post1 pypi_0 pypi
fonttools 4.53.0 pypi_0 pypi
frozenlist 1.4.1 pypi_0 pypi
fsspec 2024.6.0 pypi_0 pypi
gradio 3.39.0 pypi_0 pypi
gradio-client 1.0.1 pypi_0 pypi
grpcio 1.64.1 pypi_0 pypi
h11 0.14.0 pypi_0 pypi
hjson 3.1.0 pypi_0 pypi
httpcore 1.0.5 pypi_0 pypi
httpx 0.27.0 pypi_0 pypi
huggingface-hub 0.23.4 pypi_0 pypi
idna 3.7 pypi_0 pypi
imageio 2.34.2 pypi_0 pypi
importlib-resources 6.4.0 pypi_0 pypi
jinja2 3.1.4 pypi_0 pypi
jsonschema 4.22.0 pypi_0 pypi
jsonschema-specifications 2023.12.1 pypi_0 pypi
kiwisolver 1.4.5 pypi_0 pypi
lazy-loader 0.4 pypi_0 pypi
ld_impl_linux-64 2.38 h1181459_1 https://mirrors.tuna.tsinghua.edu.cn/anaconda/pkgs/main
libffi 3.4.4 h6a678d5_1 https://mirrors.tuna.tsinghua.edu.cn/anaconda/pkgs/main
libgcc-ng 11.2.0 h1234567_1 https://mirrors.tuna.tsinghua.edu.cn/anaconda/pkgs/main
libgomp 11.2.0 h1234567_1 https://mirrors.tuna.tsinghua.edu.cn/anaconda/pkgs/main
libstdcxx-ng 11.2.0 h1234567_1 https://mirrors.tuna.tsinghua.edu.cn/anaconda/pkgs/main
linkify-it-py 2.0.3 pypi_0 pypi
markdown-it-py 2.2.0 pypi_0 pypi
markdown2 2.4.10 pypi_0 pypi
markupsafe 2.1.5 pypi_0 pypi
matplotlib 3.9.0 pypi_0 pypi
mdit-py-plugins 0.3.3 pypi_0 pypi
mdurl 0.1.2 pypi_0 pypi
msgpack 1.0.8 pypi_0 pypi
multidict 6.0.5 pypi_0 pypi
ncurses 6.4 h6a678d5_0 https://mirrors.tuna.tsinghua.edu.cn/anaconda/pkgs/main
networkx 3.2.1 pypi_0 pypi
ninja 1.11.1.1 pypi_0 pypi
numpy 1.24.2 pypi_0 pypi
openai 0.27.8 pypi_0 pypi
opencv-python 4.8.0.74 pypi_0 pypi
openssl 3.0.14 h5eee18b_0 https://mirrors.tuna.tsinghua.edu.cn/anaconda/pkgs/main
orjson 3.10.5 pypi_0 pypi
packaging 24.1 pypi_0 pypi
pandas 2.2.2 pypi_0 pypi
peft 0.4.0 pypi_0 pypi
pillow 9.4.0 pypi_0 pypi
pip 24.0 py39h06a4308_0 https://mirrors.tuna.tsinghua.edu.cn/anaconda/pkgs/main
protobuf 5.27.1 pypi_0 pypi
psutil 6.0.0 pypi_0 pypi
py-cpuinfo 9.0.0 pypi_0 pypi
pydantic 2.7.4 pypi_0 pypi
pydantic-core 2.18.4 pypi_0 pypi
pydub 0.25.1 pypi_0 pypi
pyparsing 3.1.2 pypi_0 pypi
python 3.9.19 h955ad1f_1 https://mirrors.tuna.tsinghua.edu.cn/anaconda/pkgs/main
python-dateutil 2.9.0.post0 pypi_0 pypi
python-multipart 0.0.9 pypi_0 pypi
pytz 2024.1 pypi_0 pypi
pyyaml 6.0.1 pypi_0 pypi
ray 2.6.1 pypi_0 pypi
readline 8.2 h5eee18b_0 https://mirrors.tuna.tsinghua.edu.cn/anaconda/pkgs/main
referencing 0.35.1 pypi_0 pypi
regex 2024.5.15 pypi_0 pypi
requests 2.31.0 pypi_0 pypi
rpds-py 0.18.1 pypi_0 pypi
safetensors 0.4.3 pypi_0 pypi
scikit-image 0.24.0 pypi_0 pypi
scipy 1.11.2 pypi_0 pypi
semantic-version 2.10.0 pypi_0 pypi
sentencepiece 0.2.0 pypi_0 pypi
setuptools 69.5.1 py39h06a4308_0 https://mirrors.tuna.tsinghua.edu.cn/anaconda/pkgs/main
shortuuid 1.0.11 pypi_0 pypi
six 1.16.0 pypi_0 pypi
sniffio 1.3.1 pypi_0 pypi
sqlite 3.45.3 h5eee18b_0 https://mirrors.tuna.tsinghua.edu.cn/anaconda/pkgs/main
starlette 0.27.0 pypi_0 pypi
tifffile 2024.6.18 pypi_0 pypi
tk 8.6.14 h39e8969_0 https://mirrors.tuna.tsinghua.edu.cn/anaconda/pkgs/main
tokenizers 0.13.3 pypi_0 pypi
toolz 0.12.1 pypi_0 pypi
torch 1.13.1+cu117 pypi_0 pypi
torchvision 0.14.1+cu117 pypi_0 pypi
tqdm 4.64.1 pypi_0 pypi
transformers 4.31.0 pypi_0 pypi
typing-extensions 4.12.2 pypi_0 pypi
tzdata 2024.1 pypi_0 pypi
uc-micro-py 1.0.3 pypi_0 pypi
urllib3 2.2.2 pypi_0 pypi
uvicorn 0.23.2 pypi_0 pypi
websockets 11.0.3 pypi_0 pypi
wheel 0.43.0 py39h06a4308_0 https://mirrors.tuna.tsinghua.edu.cn/anaconda/pkgs/main
xz 5.4.6 h5eee18b_1 https://mirrors.tuna.tsinghua.edu.cn/anaconda/pkgs/main
yacs 0.1.8 pypi_0 pypi
yarl 1.9.4 pypi_0 pypi
zipp 3.19.2 pypi_0 pypi
zlib 1.2.13 h5eee18b_1 https://mirrors.tuna.tsinghua.edu.cn/anaconda/pkgs/main
I ran this model (with bf16) successfully on an L20 GPU (48GB), but it is really at risk of running out of memory (OOM).
For fp16, it used about 26GB, which is why we got OOM with a 3090 or any GPU with 24GB.
For RAM: the model will use about 26GB, so I think >= 32GB RAM should be a good choice.
Hi, would you mind share the way how you fix that 'rope_theta' issue? I'm using a 32GB v100 , it's not enough for bf16, so I tried the fp16 format, (though I dont know why fp16 can reduce memory cost significantly, according to your post.), but met the problem above.
It looks like relate to deepspeed, which I'm using:
deepspped==0.14.4
transformers==4.31.0
你好,想问下最终运行结果显示什么样呀?
For the problem of a single 3090 memory explosion, can I use two 3090s in parallel training? It appears that the source code does not provide parallel training?
I successfully deployed BF16 (with a display of 26GB of VRAM), but after encountering an issue with the UseSR interaction input, the VRAM was running low. I am using 40GB. What is the problem?
I successfully deployed BF16 (with a display of 26GB of VRAM), but after encountering an issue with the UseSR interaction input, the VRAM was running low. I am using 40GB. What is the problem?
I had the same problem, again because it wasn't big enough. I re-used the A800 (80G) to run successfully and achieve communication and pose generation!
I add "device_map="auto" at line 178 in main_chat.py and set precision=fp16,but it is still OOM...... Did I do sth wrong?
Hello,
main_chat.py
with bf16 precision I get an OOM error. I am using a 24GB GPU. Is this expected? Can't find info about the minimal gpu requirement.AttributeError: 'LlamaAttention' object has no attribute 'rope_theta'
. I think this is related todeepspeed
, which was not listed in the requirements, so should I install a specific version?Thanks!