sherwinbahmani / 4dfy

4D-fy: Text-to-4D Generation Using Hybrid Score Distillation Sampling
https://sherwinbahmani.github.io/4dfy/
Apache License 2.0
288 stars 7 forks source link

Extension of threestudio #3

Closed DSaurus closed 6 months ago

DSaurus commented 6 months ago

Hi @sherwinbahmani ,

I'm working on implementing 4D-fy as an extension of threestudio in this repository and have invited you as a collaborator. You're welcome to join the project if interested.

DSaurus commented 6 months ago

I find an interesting thing. The spatial-temporal encoding process directly handles a 4-dimensional input to obtain its encoding. However, I remember that the input for TCNN should be either 2 or 3 dimensions. tcnn_encoding

sherwinbahmani commented 6 months ago

Hi,

https://github.com/NVlabs/tiny-cuda-nn/blob/212104156403bd87616c1a4f73a1c5f2c2e172a9/include/tiny-cuda-nn/encodings/grid.h#L1177

It does support up to 4D inputs. There are more efficient ways to do this point and time encoding of course.

Did you get any error? With what tcnn version?

DSaurus commented 6 months ago

No, I don't get an error. But does TCNN actually implement a huge O(n^4) hash grid to represent NeRF-T? That's amazing.

sherwinbahmani commented 6 months ago

Yeah, I saw some recent reconstruction methods using 4D hash grids as well, like this NeurIPS work: https://arxiv.org/pdf/2310.17527.pdf

It is computationally heavy though. It might make sense to use a better representation for this, e.g., 4D planes or deformation based. The best thing would be nerfplayer, as it can represent deformations and newness content (like the fire, water from the firehydrant, etc.), which deformations alone can not handle.