I wanted to ask you, how did you compute the query speed of tables 3 and 4 of the paper?
I've observed that the current implementation stores each level of language features on a different set of gaussians. Therefore you require 3 rasterization steps to obtain the 2D language features for a given point of view.
Are you taking this rasterization and the posterior decoding time into account, or you are just measuring the query matching with the CLIP features?
Hi, really nice work!
I wanted to ask you, how did you compute the query speed of tables 3 and 4 of the paper?
I've observed that the current implementation stores each level of language features on a different set of gaussians. Therefore you require 3 rasterization steps to obtain the 2D language features for a given point of view. Are you taking this rasterization and the posterior decoding time into account, or you are just measuring the query matching with the CLIP features?
Thank you!