CUDA out of memory - Githubissues

jly0810 commented 3 months ago

When I was using the RTX3090 graphics card (24GB graphics memory) for the second stage of training, I encountered a problem of insufficient graphics memory. I remember you said before that 24GB graphics memory is acceptable. Is there any way for me to run your work? I am very interested in your work!

Lizb6626 commented 3 months ago

To me, it appears that another process is utilizing the GPU. From the terminal snapshot, it states that your GPU has a capacity of 23.68 GiB, and our process is using 22.19 GiB. Therefore, it should be fine to allocate 20 MiB, but it failed. Are you running the code on your local machine? It's possible that your local process is using GPU resources, such as Windows Graphics.

jly0810 commented 3 months ago

I am running locally on Ubuntu, and I have checked the backend to see that there are no other non essential applications occupying graphics memory. I tried to modify -- mcubes-resolution and -- tet_grid_size to 128, but similar issues still occur. Strangely, as shown in the figure below, 59.19MB is free, why can't 56MB be allocated? And why is 365.63MB reserved but not allocated? I tried using max_split_size_mb and set it to 32/64/128/256, but it didn't work. Do you have any solution?

Lizb6626 commented 3 months ago

Try nvidia-smi to check how much GPU resources other programs are occupying.

Whalesong-zrs commented 4 weeks ago

have you solved this problem? I also meet the oom problem at stage2.

Whalesong-zrs commented 4 weeks ago

To me, it appears that another process is utilizing the GPU. From the terminal snapshot, it states that your GPU has a capacity of 23.68 GiB, and our process is using 22.19 GiB. Therefore, it should be fine to allocate 20 MiB, but it failed. Are you running the code on your local machine? It's possible that your local process is using GPU resources, such as Windows Graphics.

I also meet the oom problem at stage2. When I run the img at 10241024, it will oom at epoch 16 in stage2, and run img with 512512, oom at epoch 40.

Whalesong-zrs commented 3 weeks ago

When I was using the RTX3090 graphics card (24GB graphics memory) for the second stage of training, I encountered a problem of insufficient graphics memory. I remember you said before that 24GB graphics memory is acceptable. Is there any way for me to run your work? I am very interested in your work!

have you solved this problem?

Lizb6626 commented 3 weeks ago

Hi @Whalesong-zrs

Can you provide information about the environment you run the code?

Whalesong-zrs commented 3 weeks ago

Hi @Whalesong-zrs

Can you provide information about the environment you run the code?

Here I found that torch 1.12.0 doesn't have a suitable version of xformers, so I am using torch 1.13.x. Additionally, I tried using fp16 during the second stage of training and encountered errors.

absl-py==2.1.0 accelerate==0.33.0 addict==2.4.0 aiofiles==23.2.1 aiohappyeyeballs==2.3.5 aiohttp==3.10.2 aiosignal==1.3.1 albumentations==1.3.0 altair==5.3.0 annotated-types==0.7.0 antlr4-python3-runtime==4.9.3 anyio==4.4.0 asttokens==2.4.1 async-timeout==4.0.3 attrs==24.2.0 basicsr==1.4.2 beautifulsoup4==4.12.3 blinker==1.8.2 braceexpand==0.1.7 cachetools==5.4.0 carvekit_colab==4.1.2 certifi==2024.7.4 charset-normalizer==3.3.2 click==8.1.7 clip @ git+https://github.com/openai/CLIP.git@dcba3cb2e2827b402d2701e7e1c7d9fed8a20ef1 coloredlogs==15.0.1 comm==0.2.2 contourpy==1.2.1 cycler==0.12.1 dearpygui==1.11.1 decorator==5.1.1 diffusers==0.25.0 einops==0.8.0 entrypoints==0.4 exceptiongroup==1.2.2 executing==2.0.1 fastapi==0.112.0 fastcore==1.6.4 ffmpy==0.4.0 filelock==3.15.4 fire==0.6.0 Flask==3.0.3 flatbuffers==24.3.25 fonttools==4.53.1 freqencoder @ file:///home/zrs/HyperDreamer/freqencoder frozenlist==1.4.1 fsspec==2024.6.1 ftfy==6.2.3 future==1.0.0 gdown==5.2.0 gitdb==4.0.11 GitPython==3.1.43 gradio==4.41.0 gradio_client==1.3.0 gridencoder @ file:///home/zrs/HyperDreamer/gridencoder grpcio==1.65.4 h11==0.14.0 httpcore==1.0.5 httpx==0.27.0 huggingface-hub==0.24.5 humanfriendly==10.0 idna==3.7 imageio==2.34.2 imageio-ffmpeg==0.5.1 importlib_metadata==8.2.0 importlib_resources==6.4.0 invisible-watermark==0.1.5 ipdb==0.13.13 ipycanvas==0.13.2 ipyevents==2.0.2 ipython==8.26.0 ipywidgets==8.1.3 itsdangerous==2.2.0 jedi==0.19.1 Jinja2==3.1.4 joblib==1.4.2 jsonschema==4.23.0 jsonschema-specifications==2023.12.1 jupyter_client==7.4.9 jupyter_core==5.7.2 jupyterlab_widgets==3.0.11 kaolin==0.14.0 kiwisolver==1.4.5 kornia==0.6.0 lazy_loader==0.4 lightning-utilities==0.11.6 llvmlite==0.43.0 lmdb==1.5.1 loguru==0.7.2 lovely-numpy==0.2.13 lovely-tensors==0.1.16 Markdown==3.6 markdown-it-py==3.0.0 MarkupSafe==2.1.5 matplotlib==3.9.1.post1 matplotlib-inline==0.1.7 mdurl==0.1.2 mpmath==1.3.0 multidict==6.0.5 mypy-extensions==1.0.0 nest-asyncio==1.6.0 networkx==3.3 ninja==1.11.1.1 numba==0.60.0 numpy==1.26.4 nvdiffrast @ git+https://github.com/NVlabs/nvdiffrast/@c5caf7bdb8a2448acc491a9faa47753972edd380 nvidia-cublas-cu11==11.10.3.66 nvidia-cublas-cu12==12.1.3.1 nvidia-cuda-cupti-cu12==12.1.105 nvidia-cuda-nvrtc-cu11==11.7.99 nvidia-cuda-nvrtc-cu12==12.1.105 nvidia-cuda-runtime-cu11==11.7.99 nvidia-cuda-runtime-cu12==12.1.105 nvidia-cudnn-cu11==8.5.0.96 nvidia-cudnn-cu12==9.1.0.70 nvidia-cufft-cu12==11.0.2.54 nvidia-curand-cu12==10.3.2.106 nvidia-cusolver-cu12==11.4.5.107 nvidia-cusparse-cu12==12.1.0.106 nvidia-nccl-cu12==2.20.5 nvidia-nvjitlink-cu12==12.6.20 nvidia-nvtx-cu12==12.1.105 omegaconf==2.3.0 onnx==1.16.2 onnxruntime==1.18.1 onnxruntime-gpu==1.18.1 open_clip_torch==2.26.1 opencv-contrib-python==4.10.0.84 opencv-python==4.10.0.84 opencv-python-headless==4.10.0.84 OpenEXR==3.2.4 orjson==3.10.7 packaging==24.1 pandas==2.2.2 parso==0.8.4 pexpect==4.9.0 pillow==10.4.0 platformdirs==4.2.2 plotly==5.23.0 pooch==1.8.2 prettytable==3.6.0 prompt_toolkit==3.0.47 protobuf==4.25.4 psutil==6.0.0 ptyprocess==0.7.0 pure_eval==0.2.3 py-cpuinfo==9.0.0 pyarrow==17.0.0 pybind11==2.13.1 pycocotools==2.0.8 pydantic==2.8.2 pydantic_core==2.20.1 pydeck==0.9.1 pydub==0.25.1 Pygments==2.18.0 PyMatting==1.1.12 PyMCubes==0.1.6 pymeshlab==2023.12.post1 pyparsing==3.1.2 pyrallis==0.3.1 pyre-extensions==0.0.23 PySocks==1.7.1 python-dateutil==2.9.0.post0 python-multipart==0.0.9 pytorch-lightning==2.0.2 pytorch-msssim==1.0.0 pytz==2024.1 PyWavelets==1.6.0 PyYAML==6.0.2 pyzmq==24.0.1 qudida==0.0.4 raymarching @ file:///home/zrs/HyperDreamer/raymarching referencing==0.35.1 regex==2024.7.24 rembg==2.0.58 requests==2.32.3 rich==13.7.1 rpds-py==0.20.0 ruff==0.5.7 safetensors==0.4.4 scikit-image==0.24.0 scikit-learn==1.5.1 scipy==1.14.0 seaborn==0.13.2 segment_anything @ git+https://github.com/facebookresearch/segment-anything.git@6fdee8f2727f4506cfbbe553e23b895e27956588 semantic-version==2.10.0 sentencepiece==0.2.0 shellingham==1.5.4 shencoder @ file:///home/zrs/HyperDreamer/shencoder six==1.16.0 smmap==5.0.1 sniffio==1.3.1 soupsieve==2.5 stack-data==0.6.3 starlette==0.37.2 streamlit==1.37.1 streamlit-drawable-canvas==0.8.0 sympy==1.13.1 taming-transformers-rom1504==0.0.6 tb-nightly==2.18.0a20240809 tenacity==8.5.0 tensorboard==2.17.0 tensorboard-data-server==0.7.2 tensorboardX==2.6.2.2 termcolor==2.4.0 test_tube==0.7.5 threadpoolctl==3.5.0 tifffile==2024.7.24 timm==1.0.8 tokenizers==0.19.1 toml==0.10.2 tomli==2.0.1 tomlkit==0.12.0 toolz==0.12.1 torch==1.13.1 torch-ema==0.3 torchmetrics==1.4.1 torchvision==0.14.1 tornado==6.4.1 tqdm==4.66.5 traitlets==5.14.3 transformers==4.44.0 trimesh==4.4.4 triton==3.0.0 typer==0.12.3 typing-inspect==0.9.0 typing_extensions==4.12.2 tzdata==2024.1 ultralytics==8.2.75 ultralytics-thop==2.0.0 urllib3==2.2.2 usd-core==23.5 uvicorn==0.30.5 vtk==9.3.1 watchdog==4.0.1 wcwidth==0.2.13 webdataset==0.2.86 websockets==12.0 Werkzeug==3.0.3 widgetsnbextension==4.0.11 xatlas==0.0.9 xformers==0.0.16 yapf==0.32.0 yarl==1.9.4 zipp==3.19.2

Lizb6626 commented 3 weeks ago

Hi @Whalesong-zrs

We carry out experiments on A100 GPU and require less than 24GB memory. This may result in differences on RTX 3090. We are currently implementing our method on RTX4090 (which also has 24GB memory). We wil inform you as soon as we have results.

Whalesong-zrs commented 3 weeks ago

Hi @Whalesong-zrs

We carry out experiments on A100 GPU and require less than 24GB memory. This may result in differences on RTX 3090. We are currently implementing our method on RTX4090 (which also has 24GB memory). We wil inform you as soon as we have results.

Thanks a lot!

Whalesong-zrs commented 2 weeks ago

Hi @Whalesong-zrs

We carry out experiments on A100 GPU and require less than 24GB memory. This may result in differences on RTX 3090. We are currently implementing our method on RTX4090 (which also has 24GB memory). We wil inform you as soon as we have results.

Hi @Whalesong-zrs

We carry out experiments on A100 GPU and require less than 24GB memory. This may result in differences on RTX 3090. We are currently implementing our method on RTX4090 (which also has 24GB memory). We wil inform you as soon as we have results.

I'd like to know if there is any follow-up to this question?

Lizb6626 commented 1 week ago

Hi @Whalesong-zrs

One simple solution is to disable SR module since the inference of SR model takes 3-4 GB memory. Another way is to decrease the resolution used in SR module to 512 (original one uses 768).

Whalesong-zrs commented 1 week ago

Hi @Whalesong-zrs

One simple solution is to disable SR module since the inference of SR model takes 3-4 GB memory. Another way is to decrease the resolution used in SR module to 512 (original one uses 768).

Thanks a lot. We do not change the resolution in SR module before.

wutong16 / HyperDreamer

CUDA out of memory #4