Questions about query sample strategy of AutoEncoder.

zoomin-lee / SemCity

[CVPR 2024] The official implementation for "SemCity: Semantic Scene Generation with Triplane Diffusion"

150 stars 11 forks source link

I want to express my gratitude to the authors for sharing such outstanding work. @zoomin-lee @nautilus-a I'd like to inquire about the sampling strategy used for decoding when training tri-plane AutoEncoder . It's primarily implemented in get_query() function, prioritizing all voxels with semantic labels and also computing background labels near the surface using truncated fields, which is understandable.However, after diffusion outputs a tri-plane, the decoding query sampling strategy is implemented in make_query(), uniformly sampling voxels in space. Could this lead to a significant gap between training and testing?So, based on your experience, what's a usable mIoU range for the final autoencoder for the subsequent stages? (I understand that your code evaluates mIoU using queries obtained with get_query().)

Thank you for your interest in our work. As you mentioned, we train and evaluate the AE using get_query() during the 1st stage (AE training stage), and we sample a scene through make_query() in the second stage (diffusion stage). As we know, there is no significant gap between using get_query() and make_query() in the AE training stage; this is because our AE is based on an implicit representation. Plus, in the get query() codes, we also randomly sample queries within air regions. About another question, the 80-90 of the AE's mIoU might be usable for subsequent stages. Note that get_query() is used to check the training process; in the paper, we evaluate our diffusion model using make_query(). Thank you again for your interest.

zoomin-lee / SemCity

Questions about query sample strategy of AutoEncoder. #7