kwea123 / nerf_pl

NeRF (Neural Radiance Fields) and NeRF in the Wild using pytorch-lightning
https://www.youtube.com/playlist?list=PLDV2CyUo4q-K02pNEyDr7DYpTQuka3mbV
MIT License
2.74k stars 483 forks source link

Struggling to create other than blurry spots for a 360 white background scan #94

Closed cduguet closed 3 years ago

cduguet commented 3 years ago

I am using 24 pictures of a boot captured on 360deg in a studio, with a white background. I am training on 360 deg with res 300x300, but all I am getting after 30 epochs are black spots.

This is how the boot looks like

And this is what I'm currently getting: epoch:0 epoch=0 ckpt epoch:10 epoch=10 ckpt epoch:20 epoch=20 ckpt epoch:30 epoch=30 ckpt

I have double-checked that the COLMAP output makes sense. I used the output of the MacOS COLMAP gui, since the commands line options were not registering all camera poses.

Screenshot 2021-08-17 at 23 28 13

I am using the branch master as I just moved from colab (I just read it is not supported so first thing I'll do is to update this post with results from dev). EDIT: Update with *dev branch is in the end.

My training process looks like:

python train.py --dataset_name llff --root_dir /host/home/ubuntu/datasets/boot/results_macos_auto/ --N_importance 64 --img_wh 300 300  --spheric --use_disp  --num_ep
ochs 40 --batch_size 1024  --optimizer adam --lr 5e-4  --lr_scheduler cosine  --exp_name boot_macos
INFO:lightning:GPU available: True, used: True
INFO:lightning:CUDA_VISIBLE_DEVICES: [0]
val image is /host/home/ubuntu/datasets/boot/results_macos_auto/images/a1511_keen_targhee_iii_mid_waterproof_mens_boots_01.jpg
Epoch 4:  62%|###################################################8                               | 1264/2023 [02:08<01:17,  9.81it/s, loss=0.003, train_psnr=29.1, v_num=1, val_loss=0.00557, val_psnr=25.6]
Epoch 38:  54%|############################################8                                      | 1093/2023 [02:00<01:42,  9.04it/s, loss=0.001, train_psnr=37.4, v_num=1, val_loss=0.0129, val_psnr=24.2]

UPDATE: Now this is the result of training with theb dev branch:

python train.py    --dataset_name llff    --root_dir /host/home/ubuntu/datasets/boot/results_macos_auto/    --N_importance 64 --img_wh 300 300    --num_epochs 60 --batch_size 1024    --optimizer adam --lr 5e-4    --lr_scheduler steplr --decay_step 10 20 --decay_gamma 0.5    --exp_name boot_macos_dev
GPU available: True, used: True
TPU available: None, using: 0 TPU cores
LOCAL_RANK: 0 - CUDA_VISIBLE_DEVICES: [0]
val image is /host/home/ubuntu/datasets/boot/results_macos_auto/images/a1511_keen_targhee_iii_mid_waterproof_mens_boots_01.jpg
Epoch 59: 100%|###########################################################################################################| 2023/2023 [03:08<00:00, 10.71it/s, loss=0.00097, val/psnr=9.18, train/psnr=32.3]

Profiler Report

Action                          |  Mean duration (s)    |Num calls              |  Total time (s)       |  Percentage %         |
-----------------------------------------------------------------------------------------------------------------------------
Total                           |  -                    |_                      |  1.1325e+04           |  100 %                |
-----------------------------------------------------------------------------------------------------------------------------
run_training_epoch              |  188.47               |60                     |  1.1308e+04           |  99.853               |
run_training_batch              |  0.088424             |121320                 |  1.0728e+04           |  94.727               |
optimizer_step_and_closure_0    |  0.081402             |121320                 |  9875.7               |  87.205               |
training_step_and_backward      |  0.074007             |121320                 |  8978.5               |  79.282               |
model_backward                  |  0.04765              |121320                 |  5780.9               |  51.046               |
model_forward                   |  0.025952             |121320                 |  3148.5               |  27.802               |
evaluation_step_and_end         |  2.8963               |61                     |  176.67               |  1.5601               |
on_train_batch_end              |  0.00075952           |121320                 |  92.145               |  0.81366              |
get_train_batch                 |  0.00055926           |121320                 |  67.849               |  0.59912              |
cache_result                    |  2.3312e-05           |607330                 |  14.158               |  0.12502              |
on_batch_start                  |  2.1771e-05           |121320                 |  2.6412               |  0.023322             |
on_after_backward               |  1.9828e-05           |121320                 |  2.4056               |  0.021242             |
on_batch_end                    |  1.9037e-05           |121320                 |  2.3096               |  0.020394             |
on_before_zero_grad             |  1.7894e-05           |121320                 |  2.1709               |  0.01917              |
on_train_batch_start            |  1.2868e-05           |121320                 |  1.5611               |  0.013785             |
training_step_end               |  1.2558e-05           |121320                 |  1.5236               |  0.013454             |
on_validation_end               |  0.016731             |61                     |  1.0206               |  0.0090118            |
on_validation_batch_end         |  0.011367             |61                     |  0.69336              |  0.0061225            |
on_validation_start             |  0.00077551           |61                     |  0.047306             |  0.00041772           |
on_epoch_start                  |  0.00059637           |60                     |  0.035782             |  0.00031596           |
on_validation_batch_start       |  0.00011321           |61                     |  0.0069059            |  6.098e-05            |
validation_step_end             |  3.0494e-05           |61                     |  0.0018601            |  1.6425e-05           |
on_epoch_end                    |  2.8852e-05           |60                     |  0.0017311            |  1.5286e-05           |
on_validation_epoch_end         |  2.7406e-05           |61                     |  0.0016717            |  1.4762e-05           |
on_train_epoch_start            |  1.8696e-05           |60                     |  0.0011218            |  9.9054e-06           |
on_validation_epoch_start       |  1.8263e-05           |61                     |  0.0011141            |  9.8373e-06           |
on_train_epoch_end              |  1.6983e-05           |60                     |  0.001019             |  8.9978e-06           |
on_train_start                  |  0.00044097           |1                      |  0.00044097           |  3.8939e-06           |
on_train_end                    |  0.00039886           |1                      |  0.00039886           |  3.522e-06            |
on_fit_start                    |  2.7572e-05           |1                      |  2.7572e-05           |  2.4347e-07           |

And this is the result on:

Epoch 40: boot_macos_devepoch=40 ckpt

Epoch 46: boot_macos_devepoch=46 ckpt


UPDATE 2: I just noticed I was missing the --spheric command. Now I've trained with --spheric :

python train.py --dataset_name llff --root_dir /host/home/ubuntu/datasets/boot/colmap_ubuntu_auto/ --N_importance 64 --img_wh 300 300 --num_epochs 60 --batch_size 1024 --opti
mizer adam --lr 5e-4 --lr_scheduler steplr --decay_step 10 20 --decay_gamma 0.5 --exp_name boot_ubuntu_dev --spheric
Epoch 59: 100%|##########################################################################################################| 2023/2023 [03:08<00:00, 10.72it/s, loss=0.000545, val/psnr=21.9, train/psnr=38.2]

Profiler Report

Action                          |  Mean duration (s)    |Num calls              |  Total time (s)       |  Percentage %         |
-----------------------------------------------------------------------------------------------------------------------------
Total                           |  -                    |_                      |  1.2238e+04           |  100 %                |
-----------------------------------------------------------------------------------------------------------------------------
run_training_epoch              |  203.69               |60                     |  1.2221e+04           |  99.862               |
run_training_batch              |  0.095825             |121320                 |  1.1626e+04           |  94.995               |
optimizer_step_and_closure_0    |  0.086711             |121320                 |  1.052e+04            |  85.96                |
training_step_and_backward      |  0.0799               |121320                 |  9693.4               |  79.207               |
model_backward                  |  0.052936             |121320                 |  6422.2               |  52.477               |
model_forward                   |  0.02655              |121320                 |  3221.1               |  26.32                |
evaluation_step_and_end         |  3.1301               |61                     |  190.94               |  1.5602               |
on_train_batch_end              |  0.00076753           |121320                 |  93.116               |  0.76088              |
get_train_batch                 |  0.00058078           |121320                 |  70.461               |  0.57575              |
cache_result                    |  2.3493e-05           |607330                 |  14.268               |  0.11659              |
on_validation_end               |  0.051202             |61                     |  3.1233               |  0.025522             |
on_batch_start                  |  2.2495e-05           |121320                 |  2.7291               |  0.0223               |
on_batch_end                    |  2.2128e-05           |121320                 |  2.6846               |  0.021936             |
on_after_backward               |  1.9726e-05           |121320                 |  2.3931               |  0.019555             |
on_before_zero_grad             |  1.9007e-05           |121320                 |  2.306                |  0.018843             |
training_step_end               |  1.1803e-05           |121320                 |  1.432                |  0.011701             |
on_train_batch_start            |  1.1438e-05           |121320                 |  1.3877               |  0.011339             |
on_validation_batch_end         |  0.011765             |61                     |  0.71764              |  0.005864             |
on_validation_start             |  0.00075868           |61                     |  0.046279             |  0.00037816           |
on_epoch_start                  |  0.00058394           |60                     |  0.035036             |  0.00028629           |
on_validation_batch_start       |  0.00010152           |61                     |  0.0061925            |  5.0601e-05           |
validation_step_end             |  3.073e-05            |61                     |  0.0018745            |  1.5317e-05           |
on_epoch_end                    |  2.9529e-05           |60                     |  0.0017718            |  1.4477e-05           |
on_validation_epoch_end         |  2.601e-05            |61                     |  0.0015866            |  1.2965e-05           |
on_validation_epoch_start       |  2.0288e-05           |61                     |  0.0012376            |  1.0113e-05           |
on_train_epoch_end              |  1.976e-05            |60                     |  0.0011856            |  9.6878e-06           |
on_train_epoch_start            |  1.9691e-05           |60                     |  0.0011814            |  9.6539e-06           |
on_train_end                    |  0.00046188           |1                      |  0.00046188           |  3.7741e-06           |
on_train_start                  |  0.00033951           |1                      |  0.00033951           |  2.7742e-06           |
on_fit_start                    |  2.7617e-05           |1                      |  2.7617e-05           |  2.2567e-07           |

and evaluated with:

python eval.py    --root_dir /host/home/ubuntu/datasets/boot/colmap_ubuntu_auto/    --dataset_name llff --scene_name boot_ubuntu    --img_wh 300 300 --N_importance 6
4 --ckpt_path ckpts/boot_ubuntu_dev/epoch\=59.ckpt --spheric_poses                                                                                                                                          
100%|#####################################################################################################################################################################| 120/120 [05:31<00:00,  2.76s/it]

Only to get: boot_ubuntu-1

Should I keep training for more epochs, or what am I missing?

Thank you!

cduguet commented 3 years ago

This most probably didn't work because of the lack of the spheric options in training and testing, and maybe because of a coordinate problem that is already being discussed in https://github.com/kwea123/nerf_pl/issues/95 .