Why restrict the method to only rooms?

Hi,

thanks for your interest in our work. In general, generating cities/landscapes from text is certainly an interesting direction for future work. I agree, a similar idea could be applied for those types of scenes, however there are some things that are different:

Our depth-inpainting model is trained on indoor scenes, so generalization to outdoor environments will most likely be difficult for the model. We could replace it with an alternative depth model, that still supports the depth inpainting feature.
We focus on creating closed rooms with walls, ceiling, floor. Outdoor scenes will typically not have these features, so it will require a different pose sampling / completion strategy. We could replace our trajectory.json files with tailored pose sampling schemes for outdoor scenes, though.
Creating outdoor scenes might require a different representation altogether. In general depth discontinuities are way higher (e.g. think about the difference in depth between the sky and a building). So a mesh representation seems no longer sufficient in those cases. Pairing it with an environment map could be an interesting idea.

lukasHoel / text2room

Why restrict the method to only rooms? #1