zju3dv / EasyVolcap

[SIGGRAPH Asia 2023 (Technical Communications)] EasyVolcap: Accelerating Neural Volumetric Video Research
Other
577 stars 41 forks source link

easyvolcap.models.samplers.point_planes_sampler very slow #2

Closed adkAurora closed 6 months ago

adkAurora commented 6 months ago

it takes nearly 2hours on v100 to finish this part, is this different on architecture 75+ gpus?

dendenxu commented 6 months ago

Hi @adkAurora, sorry for the late reply. Are you referring to the process of extracting visual-hulls? Could you please paste a screenshot of the terminal output?

adkAurora commented 6 months ago

Sorry for the late reply! When I'm training 3DGS-T, the point_planes_sampler first generates files in the surfs/ folder. This process is very slow, about 12 minutes per frame. Is this different on architecture 75+ gpus?

tiny-cuda-nn warning: FullyFusedMLP is not supported for the selected architecture 70. Falling back to CutlassMLP. For maximum performance, raise the target GPU architecture to 75+.
tiny-cuda-nn warning: FullyFusedMLP is not supported for the selected architecture 70. Falling back to CutlassMLP. For maximum performance, raise the target GPU architecture to 75+.
2023-12-21 13:58:26.649892 easyvolcap.models.samplers.point_planes_sampler  Expanding pcds   1% ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 1/102  0:14:26 < -:--:--   ?  it/s p…
                           -> expand_points:
dendenxu commented 6 months ago

The pre-processing in expand_points is used for controlling the number of points by performing marching cubes and then regularize the points to be on the surface. The step should be optional for the 3DGS+T model and could also be sped up if your input point count is already small enough.

On a 4090, this pre-preprocessing should take around 30s for a 800k-point pointcloud.

  1. To skip this step, simply rename vhulls to surfs or add model_cfg.sampler_cfg.points_dir=vhulls to your command (or yaml)
  2. To speed up the pre-preprocessing, you need to control the number of input points by controlling the scale of the volume_fusion process, either by reducing the render ratio using val_dataloader_cfg.dataset_cfg.ratio or the number of depth images to render with val_dataloader_cfg.dataset_cfg.view_sample.