Closed Torment123 closed 1 year ago
I have a follow-up question on the network design, do you have any activation layers applied to the predicted rotation, scaling or position? How would you constrain them from becoming larger and larger?
Hi, based on my understanding, at each iteration, the deformation MLP needs to do inference for all the Gaussian kernels (typically ~100k), to get their offsets. I wonder how fast this process is? how is this part implemented? Thanks
Thank you for your interest in this work. Your understanding is correct; the deformation MLP requires inference for each Gaussian kernel. I have attempted to separate the static and dynamic parts to reduce the number of inferences, but this significantly impacts the rendering quality.
In terms of speed, incorporating the MLP into the Gaussian rasterization pipeline will undoubtedly reduce the rendering speed. However, even after the reduction, real-time rendering can still be achieved on the D-NeRF dataset with approximately 100k point clouds, achieving 73 FPS on the standup dataset. For comparison, the vanilla 3D-GS achieves 343 FPS on the standup dataset. In the HyperNeRF dataset, there is a significant difference in the number of point clouds. For the lemon dataset (with approximately 400k point clouds), the FPS is 17.
It's worth noting that:
The sample code I tested for D-NeRF is shown below:
frame = 520
render_poses = torch.stack([pose_spherical(angle, -30.0, 4.0) for angle in np.linspace(-180, 180, frame + 1)[:-1]], 0)
import time
t0 = time.time()
for i, pose in enumerate(tqdm(render_poses, desc="Rendering progress")):
fid = torch.Tensor([i / (frame - 1)]).cuda()
matrix = np.linalg.inv(np.array(pose))
R = -np.transpose(matrix[:3, :3])
R[:, 0] = -R[:, 0]
T = -matrix[:3, 3]
view.reset_extrinsic(R, T)
xyz = gaussians.get_xyz
time_input = fid.unsqueeze(0).expand(xyz.shape[0], -1)
d_xyz, d_rotation, d_scaling = timer.step(xyz.detach(), time_input)
results = render(view, gaussians, pipeline, background, d_xyz, d_rotation, d_scaling)
rendering = results["render"]
fps = frame / (time.time() - t0)
print("FPS = {}".format(fps))
The average inference time of deformation field and GS rasterization in the standup dataset are 0.0118s and 0.00283s respectively, with the following test code:
import time
time_deform = []
time_gs = []
for i, pose in enumerate(tqdm(render_poses, desc="Rendering progress")):
...
torch.cuda.synchronize()
t0 = time.time()
d_xyz, d_rotation, d_scaling = timer.step(xyz.detach(), time_input)
torch.cuda.synchronize()
t1 = time.time()
results = render(view, gaussians, pipeline, background, d_xyz, d_rotation, d_scaling)
torch.cuda.synchronize()
t2 = time.time()
time_deform.append(t1 - t0)
time_gs.append(t2 - t1)
rendering = results["render"]
print("Time Deformation = {}; Time GS = {}".format(np.mean(time_deform), np.mean(time_gs)))
I have a follow-up question on the network design, do you have any activation layers applied to the predicted rotation, scaling or position? How would you constrain them from becoming larger and larger?
@sunshineatnoon This is an interesting question.
I have previously attempted to impose certain constraints on d_xyz, d_rot, and d_scale, such as clamping points with minor delta values and assigning weights to the output of the deformation field based on the gradient of GS xyz. However, these modifications tended to degrade the rendering quality. Consequently, in the current version, I have not applied any activation functions to d_xyz, d_rot, or d_scale.
Their values remain relatively stable across the datasets I have observed. For instance, the maximum value of d_xyz does not exceed 0.5, and the maximum values for d_rot and d_scale do not surpass 0.1. They do not progressively increase during training.
However, I have to highlight an issue: 1-2 datasets from HyperNeRF occasionally encounter out-of-memory (oom) errors. Upon investigating this issue, I found that the oom errors occur within the Gaussian rasterization pipeline. Yet, during the iterations when oom arises, the output of the deformation field remains stable, with no anomalous values.
Hi, based on my understanding, at each iteration, the deformation MLP needs to do inference for all the Gaussian kernels (typically ~100k), to get their offsets. I wonder how fast this process is? how is this part implemented? Thanks
Thank you for your interest in this work. Your understanding is correct; the deformation MLP requires inference for each Gaussian kernel. I have attempted to separate the static and dynamic parts to reduce the number of inferences, but this significantly impacts the rendering quality.
In terms of speed, incorporating the MLP into the Gaussian rasterization pipeline will undoubtedly reduce the rendering speed. However, even after the reduction, real-time rendering can still be achieved on the D-NeRF dataset with approximately 100k point clouds, achieving 73 FPS on the standup dataset. For comparison, the vanilla 3D-GS achieves 343 FPS on the standup dataset. In the HyperNeRF dataset, there is a significant difference in the number of point clouds. For the lemon dataset (with approximately 400k point clouds), the FPS is 17.
It's worth noting that:
- current tests are conducted on a Tesla V100 with 32G VRAM.
- The Deformation Field is currently in a pure python version and has not yet been integrated into the C++ implementation of SIBR_viewers in official 3D Gaussians.
- All experiments on the D-NeRF dataset were conducted at full resolution (800x800). The resolution for HyperNeRF is consistent with the data preprocessing described in the original implementation.
The sample code I tested for D-NeRF is shown below:
frame = 520 render_poses = torch.stack([pose_spherical(angle, -30.0, 4.0) for angle in np.linspace(-180, 180, frame + 1)[:-1]], 0) import time t0 = time.time() for i, pose in enumerate(tqdm(render_poses, desc="Rendering progress")): fid = torch.Tensor([i / (frame - 1)]).cuda() matrix = np.linalg.inv(np.array(pose)) R = -np.transpose(matrix[:3, :3]) R[:, 0] = -R[:, 0] T = -matrix[:3, 3] view.reset_extrinsic(R, T) xyz = gaussians.get_xyz time_input = fid.unsqueeze(0).expand(xyz.shape[0], -1) d_xyz, d_rotation, d_scaling = timer.step(xyz.detach(), time_input) results = render(view, gaussians, pipeline, background, d_xyz, d_rotation, d_scaling) rendering = results["render"] fps = frame / (time.time() - t0) print("FPS = {}".format(fps))
The average inference time of deformation field and GS rasterization in the standup dataset are 0.0118s and 0.00283s respectively, with the following test code:
import time time_deform = [] time_gs = [] for i, pose in enumerate(tqdm(render_poses, desc="Rendering progress")): ... torch.cuda.synchronize() t0 = time.time() d_xyz, d_rotation, d_scaling = timer.step(xyz.detach(), time_input) torch.cuda.synchronize() t1 = time.time() results = render(view, gaussians, pipeline, background, d_xyz, d_rotation, d_scaling) torch.cuda.synchronize() t2 = time.time() time_deform.append(t1 - t0) time_gs.append(t2 - t1) rendering = results["render"] print("Time Deformation = {}; Time GS = {}".format(np.mean(time_deform), np.mean(time_gs)))
Thanks for your detailed response. Also, I think currently the complete implementation is not yet fully released on this repo right? I'm looking forward to play with the code when it's fully available
I have a follow-up question on the network design, do you have any activation layers applied to the predicted rotation, scaling or position? How would you constrain them from becoming larger and larger?
@sunshineatnoon This is an interesting question.
I have previously attempted to impose certain constraints on d_xyz, d_rot, and d_scale, such as clamping points with minor delta values and assigning weights to the output of the deformation field based on the gradient of GS xyz. However, these modifications tended to degrade the rendering quality. Consequently, in the current version, I have not applied any activation functions to d_xyz, d_rot, or d_scale.
Their values remain relatively stable across the datasets I have observed. For instance, the maximum value of d_xyz does not exceed 0.5, and the maximum values for d_rot and d_scale do not surpass 0.1. They do not progressively increase during training.
However, I have to highlight an issue: 1-2 datasets from HyperNeRF occasionally encounter out-of-memory (oom) errors. Upon investigating this issue, I found that the oom errors occur within the Gaussian rasterization pipeline. Yet, during the iterations when oom arises, the output of the deformation field remains stable, with no anomalous values.
I see. Thanks for the insight. Did you also adjust the learning rate in the original gaussian splatting settings? I found the scaling of Gasussian is easily exploded.
@Torment123 @sunshineatnoon I haven't made any changes to the learning rate of 3D-GS. The deformation field uses an annealed learning rate, exponentially decaying from 3k to 40k iterations. You can refer to the 4.1 Implementation details for more details. The code is currently going through ByteDance's open-source process. If you need it, I can provide you with the source code for academic purposes via email.
@Torment123 @sunshineatnoon I haven't made any changes to the learning rate of 3D-GS. The deformation field uses an annealed learning rate, exponentially decaying from 3k to 40k iterations. You can refer to the 4.1 Implementation details for more details. The code is currently going through ByteDance's open-source process. If you need it, I can provide you with the source code for academic purposes via email.
my e-mail is redemptyourself@outlook.com Thanks!
@Torment123 @sunshineatnoon I haven't made any changes to the learning rate of 3D-GS. The deformation field uses an annealed learning rate, exponentially decaying from 3k to 40k iterations. You can refer to the 4.1 Implementation details for more details. The code is currently going through ByteDance's open-source process. If you need it, I can provide you with the source code for academic purposes via email.
I have dropped you an email. Thanks so much!
@Torment123 @sunshineatnoon I haven't made any changes to the learning rate of 3D-GS. The deformation field uses an annealed learning rate, exponentially decaying from 3k to 40k iterations. You can refer to the 4.1 Implementation details for more details. The code is currently going through ByteDance's open-source process. If you need it, I can provide you with the source code for academic purposes via email.
Hi, my email is jshen27@ncsu.edu, thanks
@Torment123 @sunshineatnoon I haven't made any changes to the learning rate of 3D-GS. The deformation field uses an annealed learning rate, exponentially decaying from 3k to 40k iterations. You can refer to the 4.1 Implementation details for more details. The code is currently going through ByteDance's open-source process. If you need it, I can provide you with the source code for academic purposes via email.
Hey @ingra14m, May I also get a copy of the source code for research purpose? Thank you! My email is lynl7130@gmail.com.
@Torment123 @sunshineatnoon I haven't made any changes to the learning rate of 3D-GS. The deformation field uses an annealed learning rate, exponentially decaying from 3k to 40k iterations. You can refer to the 4.1 Implementation details for more details. The code is currently going through ByteDance's open-source process. If you need it, I can provide you with the source code for academic purposes via email.
hi @ingra14m , may i get it for academic purposes? Many thanks! My email is zhichenglu@mail.nwpu.edu.cn
Sorry for commenting on closed issue. Could you also provide me the source code? My email is dlsrbgg33@gmail.com. Thank you for your great work and contribution.
@dlsrbgg33 Done! sry for the late reply. I'm sorry that I missed this comment.
Hi, based on my understanding, at each iteration, the deformation MLP needs to do inference for all the Gaussian kernels (typically ~100k), to get their offsets. I wonder how fast this process is? how is this part implemented? Thanks