Can I use the model for voxel super resolution?

juhha1 commented 3 months ago

Thanks for sharing interesting research!

I am wondering if I can use this model solely for voxel super resolution. For example, if I have a low-res voxel with [32,32,32] in spatial dimension, can I use this method to upsample the spatial resolution of [128,128,128]?

xrenaa commented 3 months ago

Sure. The upsampler stage is for super resolution. You could train your own upsapmler on 32x32x32 resolution without any condition.

juhha1 commented 3 months ago

Thanks for your quick reply! I think I am slowly getting what you mean.

Can you verify if I am understanding it correctly?

The upsampler module in the model performs the super-resolution task in some sense.
The upsampler is essentially the decode of the unet-VAE. Here, the input to this decoder is the latent feature. If the spatial dimension of this latent feature is [32x32x32], the decoder can generate spatially upsampled voxel like [128x128x128].
To use the upsampler as off-the-shelf model, it follows steps below:
1. generate latent space of the input voxel. Meaning converting occupancy voxel as [b,32,32,32,1] to latent variable as [b,32,32,32,f] where f is the feature dimension. (this step involve dense voxel to fvdb tensor conversion)
2. feed [b,32,32,32,f] latent variable to upsampler modules.
Here, 1 can be done by "random_sample_latents" function in the network.

I hope I am getting things right and I am confusing anyone here. Thanks!

xrenaa commented 3 months ago

The upsampler module in the model performs the super-resolution task in some sense. A: Yes
The upsampler is essentially the decode of the unet-VAE. Here, the input to this decoder is the latent feature. If the spatial dimension of this latent feature is [32x32x32], the decoder can generate spatially upsampled voxel like [128x128x128]. A: Basically yes. In the default setting, the VAE decoder upsample the output by 4x from 128x128x128 to 512x512x512. We may have some assertion that the voxel size should be matched. But you may use the pretrained model for this task. But I think training your own one would be better.
To use the upsampler as off-the-shelf model, it follows steps below ... I think you are right. But we also use conditional normal in our pretrained checkpoint, so if you want to use it, you need to prepare the corresponding normals.

nv-tlabs / XCube

Can I use the model for voxel super resolution? #24