Closed mattgara closed 2 months ago
Can you share your docker file with me?
On Sat, Jun 8, 2024 at 04:17 Matt Gara @.***> wrote:
I've been able to run the example (teddy) image up until the outpainting step, but repeatedly come across the following error:
[INFO] Start outpainting. 100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 50/50 [00:02<00:00, 16.79it/s] Pipelines loaded with
dtype=torch.float16
cannot run withcpu
device. It is not recommended to move them tocpu
as running them will fail. Please make sure to use an accelerator to run the pipeline in inference, due to the lack of support forfloat16
operations on this device in PyTorch. Please, remove thetorch_dtype=torch.float16
argument, or use another device for inference. Pipelines loaded withdtype=torch.float16
cannot run withcpu
device. It is not recommended to move them tocpu
as running them will fail. Please make sure to use an accelerator to run the pipeline in inference, due to the lack of support forfloat16
operations on this device in PyTorch. Please, remove thetorch_dtype=torch.float16
argument, or use another device for inference. Pipelines loaded withdtype=torch.float16
cannot run withcpu
device. It is not recommended to move them tocpu
as running them will fail. Please make sure to use an accelerator to run the pipeline in inference, due to the lack of support forfloat16
operations on this device in PyTorch. Please, remove thetorch_dtype=torch.float16
argument, or use another device for inference. Pipelines loaded withdtype=torch.float16
cannot run withcpu
device. It is not recommended to move them tocpu
as running them will fail. Please make sure to use an accelerator to run the pipeline in inference, due to the lack of support forfloat16
operations on this device in PyTorch. Please, remove thetorch_dtype=torch.float16
argument, or use another device for inference. [INFO] Number of points at merging:68449 Traceback (most recent call last): File "/home/dreamer/threestudio/launch.py", line 301, inmain(args, extras) File "/home/dreamer/threestudio/launch.py", line 244, in main trainer.fit(system, datamodule=dm, ckpt_path=cfg.resume) File "/home/dreamer/.local/lib/python3.10/site-packages/pytorch_lightning/trainer/trainer.py", line 544, in fit call._call_and_handle_interrupt( File "/home/dreamer/.local/lib/python3.10/site-packages/pytorch_lightning/trainer/call.py", line 44, in _call_and_handle_interrupt return trainer_fn(*args, kwargs) File "/home/dreamer/.local/lib/python3.10/site-packages/pytorch_lightning/trainer/trainer.py", line 580, in _fit_impl self._run(model, ckpt_path=ckpt_path) File "/home/dreamer/.local/lib/python3.10/site-packages/pytorch_lightning/trainer/trainer.py", line 987, in _run results = self._run_stage() File "/home/dreamer/.local/lib/python3.10/site-packages/pytorch_lightning/trainer/trainer.py", line 1033, in _run_stage self.fit_loop.run() File "/home/dreamer/.local/lib/python3.10/site-packages/pytorch_lightning/loops/fit_loop.py", line 205, in run self.advance() File "/home/dreamer/.local/lib/python3.10/site-packages/pytorch_lightning/loops/fit_loop.py", line 363, in advance self.epoch_loop.run(self._data_fetcher) File "/home/dreamer/.local/lib/python3.10/site-packages/pytorch_lightning/loops/training_epoch_loop.py", line 140, in run self.advance(data_fetcher) File "/home/dreamer/.local/lib/python3.10/site-packages/pytorch_lightning/loops/training_epoch_loop.py", line 252, in advance batch_output = self.manual_optimization.run(kwargs) File "/home/dreamer/.local/lib/python3.10/site-packages/pytorch_lightning/loops/optimization/manual.py", line 94, in run self.advance(kwargs) File "/home/dreamer/.local/lib/python3.10/site-packages/pytorch_lightning/loops/optimization/manual.py", line 114, in advance training_step_output = call._call_strategy_hook(trainer, "training_step", kwargs.values()) File "/home/dreamer/.local/lib/python3.10/site-packages/pytorch_lightning/trainer/call.py", line 309, in _call_strategy_hook output = fn(args, kwargs) File "/home/dreamer/.local/lib/python3.10/site-packages/pytorch_lightning/strategies/strategy.py", line 391, in training_step return self.lightning_module.training_step(*args, kwargs) File "/home/dreamer/threestudio/custom/threestudio-3dgs/system/scene_lang.py", line 124, in training_step self.outpaint() File "/home/dreamer/threestudio/custom/threestudio-3dgs/system/scene_lang.py", line 325, in outpaint output = self(sample) File "/home/dreamer/.local/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1518, in _wrapped_call_impl return self._call_impl(*args, *kwargs) File "/home/dreamer/.local/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1527, in _call_impl return forward_call(args, kwargs) File "/home/dreamer/threestudio/custom/threestudio-3dgs/system/scene_lang.py", line 107, in forward outputs = self.renderer.batch_forward(batch) File "/home/dreamer/threestudio/custom/threestudio-3dgs/renderer/gaussian_batch_renderer.py", line 38, in batch_forward render_pkg = self.forward( File "/home/dreamer/threestudio/custom/threestudio-3dgs/renderer/diff_gaussian_rasterizer.py", line 126, in forward result_list = rasterizer( File "/home/dreamer/.local/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1518, in _wrapped_call_impl return self._call_impl(*args, kwargs) File "/home/dreamer/.local/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1527, in _call_impl return forward_call(*args, *kwargs) File "/home/dreamer/.local/lib/python3.10/site-packages/diff_gaussian_rasterization/init.py", line 222, in forward return rasterize_gaussians( File "/home/dreamer/.local/lib/python3.10/site-packages/diff_gaussian_rasterization/init.py", line 33, in rasterize_gaussians return _RasterizeGaussians.apply( File "/home/dreamer/.local/lib/python3.10/site-packages/torch/autograd/function.py", line 539, in apply return super().apply(args, kwargs) # type: ignore[misc] File "/home/dreamer/.local/lib/python3.10/site-packages/diff_gaussian_rasterization/init.py", line 97, in forward num_rendered, color, language_feature, radii, geomBuffer, binningBuffer, imgBuffer = _C.rasterize_gaussians(*args) RuntimeError: numel: integer multiplication overflow Any help would be appreciated.
Note, I'm running in a docker container.
— Reply to this email directly, view it on GitHub https://github.com/zqh0253/3DitScene/issues/7, or unsubscribe https://github.com/notifications/unsubscribe-auth/AHYZC3T47LWNDSG5TZ6JQQLZGK4YXAVCNFSM6AAAAABI7ZN5COVHI2DSMVQWIX3LMV43ASLTON2WKOZSGM2DCNJUGU2DINI . You are receiving this because you are subscribed to this thread.Message ID: @.***>
This is the docker file https://huggingface.co/spaces/qihang/3Dit-Scene/blob/main/Dockerfile
Hi, can you tell me:
Here is the output from nvidia-smi
:
+---------------------------------------------------------------------------------------+
| NVIDIA-SMI 535.129.03 Driver Version: 535.129.03 CUDA Version: 12.2 |
|-----------------------------------------+----------------------+----------------------+
| GPU Name Persistence-M | Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap | Memory-Usage | GPU-Util Compute M. |
| | | MIG M. |
|=========================================+======================+======================|
| 0 Quadro RTX 8000 Off | 00000000:1A:00.0 Off | Off |
+---------------------------------------------------------------------------------------+
and the host is
cat /etc/os-release
NAME="CentOS Linux"
VERSION="7 (Core)"
ID="centos"
ID_LIKE="rhel fedora"
VERSION_ID="7"
PRETTY_NAME="CentOS Linux 7 (Core)"
ANSI_COLOR="0;31"
CPE_NAME="cpe:/o:centos:centos:7"
HOME_URL="https://www.centos.org/"
BUG_REPORT_URL="https://bugs.centos.org/"
CENTOS_MANTISBT_PROJECT="CentOS-7"
CENTOS_MANTISBT_PROJECT_VERSION="7"
REDHAT_SUPPORT_PRODUCT="centos"
REDHAT_SUPPORT_PRODUCT_VERSION="7"
Hi, I apologize for the inconvenience caused, but it is hard for me to reproduce the error. Maybe you can check the following:
I notice that your CUDA version is 12.2, but the Dockerfile specifies cuda11.8. I'm not sure if this mismatch is causing the problem. Could you try modifying the Dockerfile, or installing the environment in a Python virtual environment?
Out of memory can cause this error: https://github.com/graphdeco-inria/gaussian-splatting/issues/24. Is any other program running simultaneously when you run the example?
Okay, thanks for looking into this.
If I have time, I'll attempt to get this working in the docker again.
FWIW, I've been able to get the Dockerfile for threestudio
to work after several rounds of debugging dependency issues, and AFAICT it looks like the Dockerfile above is based off that Dockerfile, so I can probably apply the same fixes.
The main issue in in the threestudio
Dockerfile is that not all dependencies are pinned to version numbers, so certain dependencies, when installed cause base version of torch
and other core libraries to be overriden (and newer versions installed), and this causes downstream errors.
Yes, I encountered the same issue with threestudio's Dockerfile. That's why I specified the versions for several packages. You'll need to determine the exact versions compatible with your hardware.
Overall, it seems promising now that you've resolved the issues with threestudio's Dockerfile. If you encounter any further problems, feel free to reach out for a discussion.
I've been able to run the example (teddy) image up until the outpainting step, but repeatedly come across the following error:
Any help would be appreciated.
Note, I'm running in a docker container.