Closed shariqfarooq123 closed 1 year ago
Hi,
thanks for your interest in our work. In general, generating cities/landscapes from text is certainly an interesting direction for future work. I agree, a similar idea could be applied for those types of scenes, however there are some things that are different:
Our depth-inpainting model is trained on indoor scenes, so generalization to outdoor environments will most likely be difficult for the model. We could replace it with an alternative depth model, that still supports the depth inpainting feature.
We focus on creating closed rooms with walls, ceiling, floor. Outdoor scenes will typically not have these features, so it will require a different pose sampling / completion strategy. We could replace our trajectory.json files with tailored pose sampling schemes for outdoor scenes, though.
Creating outdoor scenes might require a different representation altogether. In general depth discontinuities are way higher (e.g. think about the difference in depth between the sky and a building). So a mesh representation seems no longer sufficient in those cases. Pairing it with an environment map could be an interesting idea.
Hi,
Thanks for sharing this great work.
From the method description, it seems like the method should be generalisable to arbitrary 3D scenes e.g. cities or landscapes. Why did you choose to restrict to indoor room environments? Did you try for arbitrary scenes? What problems did you face?