Very slow on 4090 - Githubissues

cseander commented 9 months ago

I'm not sure what I'm doing wrong but I'm getting 0.01it/s on my 4090 with 100% usage (usually about 23.5 / 24GB) for stable zero123. I've tried to set data.batch_size=1 but it did not change anything.

Here is everything from the terminal:

(venv) aiuser@aicomputer:~/threestudio$ python launch.py --config configs/stable-zero123.yaml --train --gpu 0 data.image_path=./load/images/hamburger_rgba.png /home/aiuser/threestudio/venv/lib/python3.10/site-packages/controlnet_aux/segment_anything/modeling/tiny_vit_sam.py:654: UserWarning: Overwriting tiny_vit_5m_224 in registry with controlnet_aux.segment_anything.modeling.tiny_vit_sam.tiny_vit_5m_224. This is because the name being registered conflicts with an existing name. Please check if this is not expected. return register_model(fn_wrapper) /home/aiuser/threestudio/venv/lib/python3.10/site-packages/controlnet_aux/segment_anything/modeling/tiny_vit_sam.py:654: UserWarning: Overwriting tiny_vit_11m_224 in registry with controlnet_aux.segment_anything.modeling.tiny_vit_sam.tiny_vit_11m_224. This is because the name being registered conflicts with an existing name. Please check if this is not expected. return register_model(fn_wrapper) /home/aiuser/threestudio/venv/lib/python3.10/site-packages/controlnet_aux/segment_anything/modeling/tiny_vit_sam.py:654: UserWarning: Overwriting tiny_vit_21m_224 in registry with controlnet_aux.segment_anything.modeling.tiny_vit_sam.tiny_vit_21m_224. This is because the name being registered conflicts with an existing name. Please check if this is not expected. return register_model(fn_wrapper) /home/aiuser/threestudio/venv/lib/python3.10/site-packages/controlnet_aux/segment_anything/modeling/tiny_vit_sam.py:654: UserWarning: Overwriting tiny_vit_21m_384 in registry with controlnet_aux.segment_anything.modeling.tiny_vit_sam.tiny_vit_21m_384. This is because the name being registered conflicts with an existing name. Please check if this is not expected. return register_model(fn_wrapper) /home/aiuser/threestudio/venv/lib/python3.10/site-packages/controlnet_aux/segment_anything/modeling/tiny_vit_sam.py:654: UserWarning: Overwriting tiny_vit_21m_512 in registry with controlnet_aux.segment_anything.modeling.tiny_vit_sam.tiny_vit_21m_512. This is because the name being registered conflicts with an existing name. Please check if this is not expected. return register_model(fn_wrapper) Seed set to 0 [INFO] GPU available: True (cuda), used: True [INFO] TPU available: False, using: 0 TPU cores [INFO] IPU available: False, using: 0 IPUs [INFO] HPU available: False, using: 0 HPUs [INFO] You are using a CUDA device ('NVIDIA GeForce RTX 4090') that has Tensor Cores. To properly utilize them, you should set `torch.set_float32_matmul_precision('medium' | 'high')` which will trade-off precision for performance. For more details, read https://pytorch.org/docs/stable/generated/torch.set_float32_matmul_precision.html#torch.set_float32_matmul_precision [INFO] single image dataset: load image ./load/images/hamburger_rgba.png torch.Size([1, 128, 128, 3]) [INFO] single image dataset: load image ./load/images/hamburger_rgba.png torch.Size([1, 128, 128, 3]) [INFO] LOCAL_RANK: 0 - CUDA_VISIBLE_DEVICES: [0] [INFO] | Name | Type | Params

0 | geometry | ImplicitVolume | 12.6 M 1 | material | DiffuseWithPointLightMaterial | 0 2 | background | SolidColorBackground | 0 3 | renderer | NeRFVolumeRenderer | 0

12.6 M Trainable params 0 Non-trainable params 12.6 M Total params 50.450 Total estimated model params size (MB) [INFO] Validation results will be saved to outputs/zero123-sai/[64, 128, 256]_hamburger_rgba.png@20231218-132149/save [INFO] Loading Stable Zero123 ... LatentDiffusion: Running in eps-prediction mode DiffusionWrapper has 859.53 M params. Keeping EMAs of 688. making attention of type 'vanilla' with 512 in_channels Working with z of shape (1, 4, 32, 32) = 4096 dimensions. making attention of type 'vanilla' with 512 in_channels [INFO] Loaded Stable Zero123! /home/aiuser/threestudio/venv/lib/python3.10/site-packages/pytorch_lightning/trainer/connectors/data_connector.py:441: The 'train_dataloader' does not have many workers which may be a bottleneck. Consider increasing the value of the num_workers argumenttonum_workers=31in theDataLoaderto improve performance. /home/aiuser/threestudio/venv/lib/python3.10/site-packages/pytorch_lightning/trainer/connectors/data_connector.py:441: The 'val_dataloader' does not have many workers which may be a bottleneck. Consider increasing the value of thenum_workersargument to num_workers=31 in the DataLoader to improve performance. Epoch 0: | | 1/? [02:49<00:00, 0.01it/s, train/loss=38.80]

DSaurus commented 9 months ago

Hi @cseander ,

I think it is a CPU bottleneck. You can first test the stable-zero123 diffusion model by threestudio-mvimg-gen.

cseander commented 9 months ago

That's odd, I have a i9-13900KS and 128GB RAM. Could it be because I'm running it in WSL?

When I try threestudio-mvimg-gen I get the following error:

Import times for custom modules: 0.0 seconds: custom/threestudio-mvimg-gen

Seed set to 0 Traceback (most recent call last): File "/home/aiuser/threestudio/launch.py", line 301, in main(args, extras) File "/home/aiuser/threestudio/launch.py", line 168, in main dm = threestudio.find(cfg.data_type)(cfg.data) File "/home/aiuser/threestudio/threestudio/data/image.py", line 312, in init self.cfg = parse_structured(SingleImageDataModuleConfig, cfg) File "/home/aiuser/threestudio/threestudio/utils/config.py", line 127, in parse_structured scfg = OmegaConf.structured(fields(**cfg)) TypeError: SingleImageDataModuleConfig.init() got an unexpected keyword argument 'num_workers'

DSaurus commented 9 months ago

That's odd, I have a i9-13900KS and 128GB RAM. Could it be because I'm running it in WSL?

When I try threestudio-mvimg-gen I get the following error:

Import times for custom modules: 0.0 seconds: custom/threestudio-mvimg-gen

Seed set to 0 Traceback (most recent call last): File "/home/aiuser/threestudio/launch.py", line 301, in main(args, extras) File "/home/aiuser/threestudio/launch.py", line 168, in main dm = threestudio.find(cfg.data_type)(cfg.data) File "/home/aiuser/threestudio/threestudio/data/image.py", line 312, in init self.cfg = parse_structured(SingleImageDataModuleConfig, cfg) File "/home/aiuser/threestudio/threestudio/utils/config.py", line 127, in parse_structured scfg = OmegaConf.structured(fields(cfg)) TypeError: SingleImageDataModuleConfig.init**() got an unexpected keyword argument 'num_workers'

Sorry for including num_workers in config. I have removed it, and you can try again. I have heard that the 4090 may have some performance issues when running diffusion models on Windows, but I’m not very sure.

yangmuye-c commented 9 months ago

I had the same problem you had. Did you solve it?

cseander commented 9 months ago

I had the same problem you had. Did you solve it?

I changed the batch_size = [12, 8, 4] to [8, 4, 2] in https://github.com/threestudio-project/threestudio/blob/main/configs/stable-zero123.yaml and it worked, but the quality is not as good and I suppose it’s because of that since one of the tips for increasing the quality is to increase the batch size.

Some images did still not work but ones with a fairy small object ~50% of the image seemed to always work.

yangmuye-c commented 9 months ago

我遇到了和你一样的问题。你解决了吗？

我将 https://github.com/thirdstudio-project/thirdstudio/blob/main/configs/stable-zero123.yaml中的batch_size = [12, 8, 4] 更改为 [8, 4, 2]并且它有效，但质量不太好，我想这是因为提高质量的技巧之一是增加批量大小。

有些图像仍然无效，但那些带有大约 50% 图像的仙女小物体的图像似乎总是有效。

Thank you. I'll try

feilen commented 9 months ago

I'm experiencing similar, but Magic123 is working as expected. Also using WSL here.

Halving the batch size on all dimentions brought me from 0.01it/s to 0.11it/s... not great but much better at least.

Jordain commented 9 months ago

I am also experience similar issues. Using WSL. Wondering if the reason it's slow is because of this warning?

WARNING[XFORMERS]: xFormers can't load C++/CUDA extensions. xFormers was built for: PyTorch 2.1.2+cu121 with CUDA 1201 (you have 2.1.2+cu118) Python 3.10.13 (you have 3.10.12) Please reinstall xformers (see https://github.com/facebookresearch/xformers#installing-xformers) Memory-efficient attention, SwiGLU, sparse and more won't be available. Set XFORMERS_MORE_DETAILS=1 for more details

xformers won't load. I know using xformers in auto1111 and comfyui significantly increases the speed of image generation.

I'll investigate and see if I get faster speeds. I am also getting around .01 to .02 it/s using the stable-zero-123.

One issue is with nerfacc says it's only compatible with up to cu118 so I am not sure how to get it to run with cu121 instead.

mxiao commented 8 months ago

I have the same problem in 4090,0.02it/s

Jordain commented 8 months ago

I find the quality doesn't improve much at 16, 8, 4. I find that the best batch size to use is 4, 2, 1. Takes about 5mins to generate.

Here is a video tutorial I made: https://www.youtube.com/watch?v=jaRr5W80N8E

threestudio-project / threestudio

Very slow on 4090 #371

0 | geometry | ImplicitVolume | 12.6 M 1 | material | DiffuseWithPointLightMaterial | 0 2 | background | SolidColorBackground | 0 3 | renderer | NeRFVolumeRenderer | 0