xxlong0 / Wonder3D

Single Image to 3D using Cross-Domain Diffusion for 3D Generation
https://www.xxlong.site/Wonder3D/
GNU Affero General Public License v3.0
4.49k stars 351 forks source link

Cannot reproduce the result. #146

Open recordmp3 opened 4 months ago

recordmp3 commented 4 months ago

I appreciate your wonderful work! the demo is excellent! However when I try to reproduce the result using your default settings (just running accelerate launch --config_file 8gpu.yaml train_mvdiffusion_image.py --config configs/train/stage1-mix-6views-lvis.yaml, not changing any yaml except the data path), I find that the network failed to produce a normal result. After finishing the first stage training (30k iters), the network do not learn when to generate rgb or normal maps (generate then randomly regardless of the switch), also do not learn to generate white background (often gray), and do not learn to generate the correct view (the front view sometimes generate a right image, or the right view sometimes generate a left or back image)

the gt and output for inference are attached here. Do you have any idea to solve this? Thank you in advance!

30000-validation_train-gt

30000-validation_train-sample_cfg1 0

xxlong0 commented 4 months ago

Our model is trained on 8 GPUs with a batch size 256 totally after accumulation. Using only one gpu will be infeasible.

recordmp3 commented 4 months ago

oh sorry I tried 8gpu with bs=256 (wrongly written as 1gpu in the command but yeah I'm using 8gpus :) And that does not work.

recordmp3 commented 4 months ago

I also tried overfitting to only one scene, only output normal, and it still does not learn to generate the correct view (the front view sometimes generate a right image, or the right view sometimes generate a left or back image) after 1k iters with batchsize = 256, without changing any of your code.

flamehaze1115 commented 4 months ago

Hello, in our experiments, we also find that overfitting on one scene will be not feasible. How much data did you use for the training in this experiment ?

I appreciate your wonderful work! the demo is excellent! However when I try to reproduce the result using your default settings (just running accelerate launch --config_file 8gpu.yaml train_mvdiffusion_image.py --config configs/train/stage1-mix-6views-lvis.yaml, not changing any yaml except the data path), I find that the network failed to produce a normal result. After finishing the first stage training (30k iters), the network do not learn when to generate rgb or normal maps (generate then randomly regardless of the switch), also do not learn to generate white background (often gray), and do not learn to generate the correct view (the front view sometimes generate a right image, or the right view sometimes generate a left or back image)

the gt and output for inference are attached here. Do you have any idea to solve this? Thank you in advance!

30000-validation_train-gt

30000-validation_train-sample_cfg1 0

recordmp3 commented 4 months ago

Hi @flamehaze1115, I tried 23k objaverse scenes, 32 scenes, 1 scene respectively. They all failed. Could you please double check that the script: configs/train/stage1-mix-6views-lvis.yaml is the correct one that you used for training?

Because currently I find another problem: If I set the hyperparameter train_dataset:bg_color:"three_choices" the same as your config, the bg during inference is often grey even when the single image input has a white background. 30000-3-validation-sample_cfg1 0

While if I set that hyperparameter to be "white", the wrong grey background disappear, which may indicate that the configs/train/stage1-mix-6views-lvis.yaml might be mistakenly a little different from your setting.

recordmp3 commented 4 months ago

I find that the stage1 training yaml has zero_init_camera_projection: true, which will cause different camera embeddings to be the same after a 2-layer MLP's projection. May I know whether this is on purpose?

GliAmanti commented 3 months ago

I appreciate your wonderful work! the demo is excellent! However when I try to reproduce the result using your default settings (just running accelerate launch --config_file 8gpu.yaml train_mvdiffusion_image.py --config configs/train/stage1-mix-6views-lvis.yaml, not changing any yaml except the data path), I find that the network failed to produce a normal result. After finishing the first stage training (30k iters), the network do not learn when to generate rgb or normal maps (generate then randomly regardless of the switch), also do not learn to generate white background (often gray), and do not learn to generate the correct view (the front view sometimes generate a right image, or the right view sometimes generate a left or back image)

the gt and output for inference are attached here. Do you have any idea to solve this? Thank you in advance!

30000-validation_train-gt

30000-validation_train-sample_cfg1 0

I also have the same problem after the training of stage 1. Did you fix the problem? And I'm not sure which checkpoint of stage 1 should put in training yaml of stage 2. Did you train stage 2 successfully?

chenming-wu commented 3 months ago

Same problem, is there anything wrong in the code release?

xxlong0 commented 2 months ago

Hi @flamehaze1115, I tried 23k objaverse scenes, 32 scenes, 1 scene respectively. They all failed. Could you please double check that the script: configs/train/stage1-mix-6views-lvis.yaml is the correct one that you used for training?

Because currently I find another problem: If I set the hyperparameter train_dataset:bg_color:"three_choices" the same as your config, the bg during inference is often grey even when the single image input has a white background. 30000-3-validation-sample_cfg1 0

While if I set that hyperparameter to be "white", the wrong grey background disappear, which may indicate that the configs/train/stage1-mix-6views-lvis.yaml might be mistakenly a little different from your setting.

The gray background is reasonable, which indicates your training doesn't converge. When your model converges well, the predicted image will have white background.

xxlong0 commented 2 months ago

Fixed a severe training bug. The "zero_init_camera_projection" in 'configs/train/stage1-mix-6views-lvis.yaml' should be False. Otherwise, the domain control and pose control will be invalid in the training.

aquarter147 commented 1 month ago

I appreciate your wonderful work! the demo is excellent! However when I try to reproduce the result using your default settings (just running accelerate launch --config_file 8gpu.yaml train_mvdiffusion_image.py --config configs/train/stage1-mix-6views-lvis.yaml, not changing any yaml except the data path), I find that the network failed to produce a normal result. After finishing the first stage training (30k iters), the network do not learn when to generate rgb or normal maps (generate then randomly regardless of the switch), also do not learn to generate white background (often gray), and do not learn to generate the correct view (the front view sometimes generate a right image, or the right view sometimes generate a left or back image) the gt and output for inference are attached here. Do you have any idea to solve this? Thank you in advance! 30000-validation_train-gt 30000-validation_train-sample_cfg1 0

I also have the same problem after the training of stage 1. Did you fix the problem? And I'm not sure which checkpoint of stage 1 should put in training yaml of stage 2. Did you train stage 2 successfully?

I have the same problem with loading checkpoint of stage 2. The missing weights are related to 'joint' blocks.

834810269 commented 1 month ago

Have you resolved the issue? Did you change the bg_color to white in the config?

834810269 commented 1 month ago

Hi @flamehaze1115, I tried 23k objaverse scenes, 32 scenes, 1 scene respectively. They all failed. Could you please double check that the script: configs/train/stage1-mix-6views-lvis.yaml is the correct one that you used for training?

Because currently I find another problem: If I set the hyperparameter train_dataset:bg_color:"three_choices" the same as your config, the bg during inference is often grey even when the single image input has a white background. 30000-3-validation-sample_cfg1 0

While if I set that hyperparameter to be "white", the wrong grey background disappear, which may indicate that the configs/train/stage1-mix-6views-lvis.yaml might be mistakenly a little different from your setting.

Did you finally change the bg_color to white in the config?

bbbbubble commented 3 days ago

I appreciate your wonderful work! the demo is excellent! However when I try to reproduce the result using your default settings (just running accelerate launch --config_file 8gpu.yaml train_mvdiffusion_image.py --config configs/train/stage1-mix-6views-lvis.yaml, not changing any yaml except the data path), I find that the network failed to produce a normal result. After finishing the first stage training (30k iters), the network do not learn when to generate rgb or normal maps (generate then randomly regardless of the switch), also do not learn to generate white background (often gray), and do not learn to generate the correct view (the front view sometimes generate a right image, or the right view sometimes generate a left or back image) the gt and output for inference are attached here. Do you have any idea to solve this? Thank you in advance! 30000-validation_train-gt 30000-validation_train-sample_cfg1 0

I also have the same problem after the training of stage 1. Did you fix the problem? And I'm not sure which checkpoint of stage 1 should put in training yaml of stage 2. Did you train stage 2 successfully?

I have the same problem with loading checkpoint of stage 2. The missing weights are related to 'joint' blocks.

@aquarter147 @GliAmanti @xxlong0 Which checkpoint to load in stage 2, and how to solve the missing weights problem, may i ask did you solve this? Have anyone successfully run stage 2 ? ...