Closed juhha1 closed 3 months ago
Sure. The upsampler stage is for super resolution. You could train your own upsapmler on 32x32x32 resolution without any condition.
Thanks for your quick reply! I think I am slowly getting what you mean.
Can you verify if I am understanding it correctly?
I hope I am getting things right and I am confusing anyone here. Thanks!
The upsampler module in the model performs the super-resolution task in some sense. A: Yes
The upsampler is essentially the decode of the unet-VAE. Here, the input to this decoder is the latent feature. If the spatial dimension of this latent feature is [32x32x32], the decoder can generate spatially upsampled voxel like [128x128x128]. A: Basically yes. In the default setting, the VAE decoder upsample the output by 4x from 128x128x128 to 512x512x512. We may have some assertion that the voxel size should be matched. But you may use the pretrained model for this task. But I think training your own one would be better.
To use the upsampler as off-the-shelf model, it follows steps below ... I think you are right. But we also use conditional normal in our pretrained checkpoint, so if you want to use it, you need to prepare the corresponding normals.
Thanks for sharing interesting research!
I am wondering if I can use this model solely for voxel super resolution. For example, if I have a low-res voxel with [32,32,32] in spatial dimension, can I use this method to upsample the spatial resolution of [128,128,128]?