Can't start magic3d-fine stage - just no error and killed.

Pashtetickus commented 1 year ago

Hello. I've set up env with docker like this:

FROM nvidia/cuda:11.8.0-cudnn8-devel-ubuntu20.04

ENV TORCH_CUDA_ARCH_LIST="8.0"
ENV TCNN_CUDA_ARCHITECTURES=80
ENV CUDA_HOME=/usr/local/cuda
ENV PATH=${CUDA_HOME}/bin:/home/${USER_NAME}/.local/bin:${PATH}
ENV LD_LIBRARY_PATH=${CUDA_HOME}/lib64:${LD_LIBRARY_PATH}
ENV LIBRARY_PATH=${CUDA_HOME}/lib64/stubs:${LIBRARY_PATH}

# apt install by root user
RUN apt-get update && DEBIAN_FRONTEND=noninteractive apt-get install -y --no-install-recommends \
    pkg-config \
    build-essential \
    curl \
    git \
    libegl1-mesa-dev \
    libgl1-mesa-dev \
    libgles2-mesa-dev \
    libglib2.0-0 \
    libsm6 \
    libxext6 \
    libxrender1 \
    libglvnd0 \
    libgl1 \
    libglx0 \
    libegl1 \
    libgles2 \
    libglvnd-dev \
    cmake \
    python-is-python3 \
    python3.8-dev \
    python3-pip \
    wget \
    && rm -rf /var/lib/apt/lists/*

ENV PYTHONDONTWRITEBYTECODE=1
ENV PYTHONUNBUFFERED=1
# for GLEW
ENV LD_LIBRARY_PATH /usr/lib64:$LD_LIBRARY_PATH
# nvidia-container-runtime
ENV NVIDIA_VISIBLE_DEVICES all
ENV NVIDIA_DRIVER_CAPABILITIES compute,utility,graphics
# Default pyopengl to EGL for good headless rendering support
ENV PYOPENGL_PLATFORM egl

RUN pip install --upgrade pip setuptools ninja imageio-ffmpeg
RUN pip install torch==2.0.1+cu118 torchvision==0.15.2+cu118 --index-url https://download.pytorch.org/whl/cu118
RUN pip install git+https://github.com/KAIR-BAIR/nerfacc.git@v0.5.2
RUN pip install git+https://github.com/NVlabs/tiny-cuda-nn.git#subdirectory=bindings/torch
RUN pip install git+https://github.com/NVlabs/nvdiffrast.git

COPY . /workspace/app
WORKDIR /workspace/app

And added these line to requirements.txt based on last issues and my needs:

transformers==4.28.1
bitsandbytes==0.38.1
pymeshlab

Also did this: cp docker/10_nvidia.json /usr/share/glvnd/egl_vendor.d/10_nvidia.json from nvdiffrast repo

nvcc --version:

And i run command like this: python launch.py --config configs/magic3d-refine-sd.yaml --train --gpu 0 system.renderer.context_type=cuda system.prompt_processor.prompt={text} system.prompt_processor.use_perp_neg=true system.geometry_convert_from=outputs/magic3d-coarse-if/magic3d/ckpts/last.ckpt trainer.max_steps=50 tag=magic3d_fine use_timestamp=False seed=42

And magic-3d-coarse works fine, but fine-stage do nothing: Screenshot_14

It just hanging. If i delete this Lock file it just Killed.

Do you have any suggestions where the problem might be?

P.S. It worked before but then i did maybe an update from requirements or pulled from the repo and it died somehow. I tried different dockers - none of them helped better than this setup.

bennyguo commented 1 year ago

You probably ran out of memory. Could you please try running with system.geometry.isosurface_resolution=32 and see if it works?

Pashtetickus commented 1 year ago

Looks like it isn't the problem - my card is A100 40gb. And i also tried it in Colab - it worked with the same requirements.txt, but i can't figure out the difference between my environment if so):

Maybe you can suggest a way how to check if docker is correct or how to properly reinstall nvidia drivers?

Pashtetickus commented 1 year ago

docker from threestudio repo also didn't help. Maybe i should try previous commits?

bennyguo commented 1 year ago

could you run the Fantasia3D system and see if it works? if not, also try system.renderer.context_type=cuda

Pashtetickus commented 1 year ago

I reinstalled the nvidia driver and CUDA, installed a new req.txt and now everything works. I think the problem was somewhere in the dependencies. Thanks for the help!

threestudio-project / threestudio

Can't start magic3d-fine stage - just no error and killed. #271