kylesargent / ZeroNVS

Apache License 2.0
477 stars 31 forks source link

How long is inference on A100-40GB expected to take? #27

Open rishabhkabra opened 1 month ago

rishabhkabra commented 1 month ago

I assume launch_inference.sh is meant to run inference on the motorcycle image. But so far it's been going for over 30 mins with no end in sight. I also noticed it calls launch.py in --train mode. Is this intended?

Here's the log:

(base) ...:~/zeronvs$ sh launch_inference.sh 
/opt/conda/lib/python3.10/site-packages/controlnet_aux/mediapipe_face/mediapipe_face_common.py:7: UserWarning: The module 'mediapipe' is not installed. The package will have limited functionality. Please install it using the command: pip install 'mediapipe'
  warnings.warn(
Seed set to 0
Setting up [LPIPS] perceptual loss: trunk [vgg], v[0.1], spatial [off]
/opt/conda/lib/python3.10/site-packages/torchvision/models/_utils.py:208: UserWarning: The parameter 'pretrained' is deprecated since 0.13 and may be removed in the future, please use 'weights' instead.
  warnings.warn(
/opt/conda/lib/python3.10/site-packages/torchvision/models/_utils.py:223: UserWarning: Arguments other than a weight enum or `None` for 'weights' are deprecated since 0.13 and may be removed in the future. The current behavior is equivalent to passing `weights=VGG16_Weights.IMAGENET1K_V1`. You can also use `weights=VGG16_Weights.DEFAULT` to get the most up-to-date weights.
  warnings.warn(msg)
Downloading: "https://download.pytorch.org/models/vgg16-397923af.pth" to /home/rkabra_google_com/.cache/torch/hub/checkpoints/vgg16-397923af.pth
100%|███████████████████████████████████████████████████████████████████████████████████| 528M/528M [00:01<00:00, 291MB/s]
Loading model from: /opt/conda/lib/python3.10/site-packages/lpips/weights/v0.1/vgg.pth
[INFO] Using 16bit Automatic Mixed Precision (AMP)
[INFO] GPU available: True (cuda), used: True
[INFO] TPU available: False, using: 0 TPU cores
[INFO] HPU available: False, using: 0 HPUs
[INFO] You are using a CUDA device ('NVIDIA A100-SXM4-40GB') that has Tensor Cores. To properly utilize them, you should set `torch.set_float32_matmul_precision('medium' | 'high')` which will trade-off precision for performance. For more details, read https://pytorch.org/docs/stable/generated/torch.set_float32_matmul_precision.html#torch.set_float32_matmul_precision
[INFO] single image dataset: load image motorcycle.png torch.Size([1, 128, 128, 3])
[INFO] single image dataset: load image motorcycle.png torch.Size([1, 128, 128, 3])
[INFO] LOCAL_RANK: 0 - CUDA_VISIBLE_DEVICES: [0]
[INFO] 
  | Name       | Type                          | Params | Mode 
---------------------------------------------------------------------
0 | geometry   | ImplicitVolume                | 12.6 M | train
1 | material   | DiffuseWithPointLightMaterial | 0      | train
2 | background | SolidColorBackground          | 0      | train
3 | renderer   | NeRFVolumeRenderer            | 767 K  | train
4 | lpips_fn   | LPIPS                         | 14.7 M | eval 
---------------------------------------------------------------------
13.4 M    Trainable params
14.7 M    Non-trainable params
28.1 M    Total params
112.350   Total estimated model params size (MB)
[INFO] Validation results will be saved to outputs/zero123/[128, 256]_motorcycle.png_prog1000@20241019-124707/save
[INFO] Loading Zero123 ...
SDS distillation only, disabling some functionality...
LatentDiffusion: Running in eps-prediction mode
DiffusionWrapper has 859.53 M params.
Keeping EMAs of 688.
making attention of type 'vanilla' with 512 in_channels
Working with z of shape (1, 4, 32, 32) = 4096 dimensions.
making attention of type 'vanilla' with 512 in_channels
100%|███████████████████████████████████████| 890M/890M [00:18<00:00, 51.5MiB/s]
[INFO] Loaded Zero123!
/opt/conda/lib/python3.10/site-packages/pytorch_lightning/trainer/connectors/data_connector.py:424: The 'train_dataloader' does not have many workers which may be a bottleneck. Consider increasing the value of the `num_workers` argument` to `num_workers=11` in the `DataLoader` to improve performance.
/opt/conda/lib/python3.10/site-packages/pytorch_lightning/trainer/connectors/data_connector.py:424: The 'val_dataloader' does not have many workers which may be a bottleneck. Consider increasing the value of the `num_workers` argument` to `num_workers=11` in the `DataLoader` to improve performance.
Epoch 0: |                                                             | 1356/? [31:53<00:00,  0.71it/s, train/loss=77.20]
kylesargent commented 1 month ago

Yes, inference may take an hour or so depending on your hardware. It is running an optimizaton called "score distillation sampling." You can find details about it in the original paper: https://arxiv.org/abs/2209.14988