xxlong0 / Wonder3D

Single Image to 3D using Cross-Domain Diffusion for 3D Generation
https://www.xxlong.site/Wonder3D/
GNU Affero General Public License v3.0
4.78k stars 384 forks source link

Texture quality and meaningful parameters #2

Open jclarkk opened 1 year ago

jclarkk commented 1 year ago

First of all, amazing work on this project!

Would it be possible to increase texture quality? I've tried increasing the img_wh under validation_dataset in the config to [512, 512] but I get: RuntimeError: Sizes of tensors must match except in dimension 1. Expected size 32 but got size 64 for tensor number 1 in the list.

As for second phase, what relevant parameters in the 'neuralangelo-ortho-wmask.yaml' config would allow modification to mesh quality?

And finally, less relevant to the above, but I've tried running on multi-GPU and the seconds phase (instant-nsr-pl) works fine when reducing the iterations but when trying to run the diffusion phase with a similar config to 8gpu.yaml but with 4 instead (running on 4xNVIDIA L4) I get OOM. Any tips you can share on this?

Thanks a lot!

flamehaze1115 commented 1 year ago

Hello, thanks for your interests in our work. Currently our diffusion model only support 256x256 resolution, so 512 does not work. For the OOM problem, you should use 1gpu.yaml, the diffusion won't cost too much memory. Current inference code supports float32, I will change it to float16 later to save memory. For stage 2, I will write a document about the parameters to favor better reconstruction methods.

Jiaozrr commented 1 year ago

First of all, amazing work on this project!

Would it be possible to increase texture quality? I've tried increasing the img_wh under validation_dataset in the config to [512, 512] but I get: RuntimeError: Sizes of tensors must match except in dimension 1. Expected size 32 but got size 64 for tensor number 1 in the list.

As for second phase, what relevant parameters in the 'neuralangelo-ortho-wmask.yaml' config would allow modification to mesh quality?

And finally, less relevant to the above, but I've tried running on multi-GPU and the seconds phase (instant-nsr-pl) works fine when reducing the iterations but when trying to run the diffusion phase with a similar config to 8gpu.yaml but with 4 instead (running on 4xNVIDIA L4) I get OOM. Any tips you can share on this?

Thanks a lot!

Hello! I met this problem, too. I find that img_wh is corresponding to the size of latents between unet and vae. So changing the sample_size of unet in mvdiffusion-joint-ortho-6views.yaml into 64 will make their size the same and fix the bug. And setting crop_size to 384 leads to a better image proportion. But I can only get inconsistent results, not as amazing as the result of 256x256. Hope authors can provide a higher-resolution pretrained unet or even training code!

fefespn commented 1 year ago

Hello, @flamehaze1115 that would very appreciated if you could share a reconstruction doc !! @jclarkk, I will try to answer with what I did, I still experiment it so maybe in the future would be better:

  1. The reconstruction stage starts first of all with resizeing the 256X256 images to 1024X1024. Now First Make sure when resizing the resolution wouldn't decrease a lot. Second make sure the object is scaled/aligned exactly to 1024 (and not small in the middle with lot of whot margins) . Third: you can manually change the front image with your input image (make sure it's aligned with the _frontnormals image, you can make sure of this by crop=-1 in the diffusion first stage).
  2. If you resize it to 2048X2048 it would be better (of course take more time to reconstruction and more memory)
  3. In the config there is a weights for each view, give the front view the highest weight.
  4. (I am not sure of this) you can do manual postprocess using blender or any other software, to map the texture+normals to the front view (this will make sure, depending in the geometry, that your texture will be good)
  5. (I will try in the future) play with the optimizer parameters.
  6. (I will try it in the future and I am not sure if I understand it right (appreciate feedback if you know the theory)) because they use NeUs, they embed the points colors to mlp network ( like nerf? ) and this would give little bit blurry for gentle small details if there is no enough input data (just 6) ), so by removing this mlp and map it as it is, the color would be nice (that wouldn't work for reflections and mirrors...)
jclarkk commented 1 year ago

@fefespn Thanks a lot for your notes, I'll try to play around with this as well.

adeerAI commented 11 months ago

@fefespn Hi, so have you tried with your suggestions? And, do you have any further good suggestions? Thanks

fefespn commented 11 months ago

So yea, what I am doing now is like this: for a basic 3D object we need geometry + texture. from the wonder3D I get the geometry, it's good enough for my needs. Then I use the second stage of dreamgaussian repositry/paper, they have a differentiable rendered for texture refinement. you need to make sure the 3D obj from wonder3d aligned with the renderer. so I optimize the Object texture using my multi-view images and that renderer.

LordLiang commented 9 months ago

So yea, what I am doing now is like this: for a basic 3D object we need geometry + texture. from the wonder3D I get the geometry, it's good enough for my needs. Then I use the second stage of dreamgaussian repositry/paper, they have a differentiable rendered for texture refinement. you need to make sure the 3D obj from wonder3d aligned with the renderer. so I optimize the Object texture using my multi-view images and that renderer.

@fefespn Hello! Can you share more about your modified second stage of dg? The different camera settings of these two methods are a little confusing.