lukasHoel / text2room

Text2Room generates textured 3D meshes from a given text prompt using 2D text-to-image models (ICCV2023).
https://lukashoel.github.io/text-to-room/
MIT License
1k stars 69 forks source link

Some questions about generating results #12

Closed SakuBorder closed 1 year ago

SakuBorder commented 1 year ago

Hi, this is an experienced job! But I found during runtime that the generated scene is discontinuous and jumping. How can I achieve continuous results like your project homepage video?

lukasHoel commented 1 year ago

During runtime it is expected that the scene is still a bit "discontinuous" (i.e., still has holes). However these should be a lot better after the completion stage and the poison reconstruction. The results on the website show a novel, custom trajectory along the final mesh after these two steps. We also visualize the intermediate meshes during runtime under the "generation stage" section on our website. Note that these are not perfectly closed yet.

SakuBorder commented 1 year ago

During runtime it is expected that the scene is still a bit "discontinuous" (i.e., still has holes). However these should be a lot better after the completion stage and the poison reconstruction. The results on the website show a novel, custom trajectory along the final mesh after these two steps. We also visualize the intermediate meshes during runtime under the "generation stage" section on our website. Note that these are not perfectly closed yet.

What I mean is not that there are holes in the scene, but rather that in the final result, such as "bedroom," the result does not display a single bedroom, but a scene where several bedrooms are fused together, while your homepage displays a complete bedroom

lukasHoel commented 1 year ago

Repeating elements such as multiple beds can occur pretty easily if we use the same prompt for every camera pose. Essentially it means that every new piece of the scene should have some part of a bed visible to satisfy the prompt.

I suggest you try prompt mixing (modify the trajectory.json to use different prompts in the first two trajectories, e.g. see the mix_* examples) or a larger prompt, which can also result in more diverse set of furniture.

Also note that the room layout is not deterministically defined, i.e., will not always be a single square room, but can actually fuse together multiple corners in a "weirder" layout. So if you are unsatisfied by the layout I suggest to run another generation. Maybe you can also tweak the trajectories to force a simpler room layout to be generated.

SakuBorder commented 1 year ago

Repeating elements such as multiple beds can occur pretty easily if we use the same prompt for every camera pose. Essentially it means that every new piece of the scene should have some part of a bed visible to satisfy the prompt.

I suggest you try prompt mixing (modify the trajectory.json to use different prompts in the first two trajectories, e.g. see the mix_* examples) or a larger prompt, which can also result in more diverse set of furniture.

Also note that the room layout is not deterministically defined, i.e., will not always be a single square room, but can actually fuse together multiple corners in a "weirder" layout. So if you are unsatisfied by the layout I suggest to run another generation. Maybe you can also tweak the trajectories to force a simpler room layout to be generated.

Does that mean that if I want to obtain a continuous and complete room layout like your homepage, I only need to constrain the trajectory. json prompt without constraining any other parts of the network?

SakuBorder commented 1 year ago

Repeating elements such as multiple beds can occur pretty easily if we use the same prompt for every camera pose. Essentially it means that every new piece of the scene should have some part of a bed visible to satisfy the prompt.

I suggest you try prompt mixing (modify the trajectory.json to use different prompts in the first two trajectories, e.g. see the mix_* examples) or a larger prompt, which can also result in more diverse set of furniture.

Also note that the room layout is not deterministically defined, i.e., will not always be a single square room, but can actually fuse together multiple corners in a "weirder" layout. So if you are unsatisfied by the layout I suggest to run another generation. Maybe you can also tweak the trajectories to force a simpler room layout to be generated.

I am currently using model/projectories/examples/bedroom.json to obtain the result of several mixed rooms, rather than a complete continuous room

lukasHoel commented 1 year ago

Yes, playing around with the trajectory file should be the only thing you need to do. Maybe also generate a bunch of scenes with the same setup to get a feel of how the layout is generated over time and how it can vary

SakuBorder commented 1 year ago

Does that mean that if I want to obtain a continuous and complete room layout like your homepage, I only need to constrain the trajectory. json prompt without constraining any other parts of the network?

Thanks. I will try it

SakuBorder commented 1 year ago

Yes, playing around with the trajectory file should be the only thing you need to do. Maybe also generate a bunch of scenes with the same setup to get a feel of how the layout is generated over time and how it can vary

What is the trajectory file you are using? Or can I email you the problematic experimental results I have obtained? I'm afraid my description is not clear enough

lukasHoel commented 1 year ago

Sure, go ahead

SakuBorder commented 1 year ago

Sure, go ahead

I have sent you an email with the results I generated. Thanks!

lukasHoel commented 1 year ago

For context: In the e-mail, they were referring to the generated animation.gif file being "jumpy and flashy". I can now see the point: this gif file is just putting all rendered images from the trajectories used for generation behind each other. So it will look flashy between trajectories and for trajectories with larger view-point changes (such as rotation).

If you want to have smooth trajectories, you can create them by rendering custom trajectories through the generated mesh:

SakuBorder commented 1 year ago

For context: In the e-mail, they were referring to the generated animation.gif file being "jumpy and flashy". I can now see the point: this gif file is just putting all rendered images from the trajectories used for generation behind each other. So it will look flashy between trajectories and for trajectories with larger view-point changes (such as rotation).

If you want to have smooth trajectories, you can create them by rendering custom trajectories through the generated mesh:

  • Open the mesh in Blender
  • Use their camera animation tools to render smooth paths through the mesh

Oh, I understand. If I want to get a smooth camera trajectory, I can't just use text2room to achieve it, but also need to perform subsequent operations in Blender, right?

lukasHoel commented 1 year ago

Yes, Text2Room is a method to generate a 3D mesh from text input. You can then use the mesh however you want afterwards, for example to render smooth trajectories through a mesh.

SakuBorder commented 1 year ago

Yes, Text2Room is a method to generate a 3D mesh from text input. You can then use the mesh however you want afterwards, for example to render smooth trajectories through a mesh.

I understand now, thank you very much! Hope to continue communicating with you!