Customized Dataset for Stable View Synthesis & CUDA Error

matveymor commented 3 years ago

Thank you very much for publishing "Stable View Synthesis", it seems to be the significant photorealistic approach for novel view synthesis! Could you add to your github page https://github.com/intel-isl/StableViewSynthesis the detailed instructions on how to build your own customized dataset, please?

Besides, I am interested in the following questions:

Can you please tell us how you calculated depth maps in your work?
When I am running the training process on my own data, this error is raised: invalid configuration argument in /notebook/SVS/StableViewSynthesis/ext/mytorch/include/common_cuda.h at 171 What might be a reason for this?

Thank you in advance!

griegler commented 3 years ago

Thanks.

Can you please tell us how you calculated depth maps in your work?

I used pyrender to render the depthmaps given the 3D mesh.

When I am running the training process on my own data, this error is raised: ...

What GPU are you using? The problem is raised in https://github.com/intel-isl/StableViewSynthesis/blob/main/ext/mytorch/include/common_cuda.h#L169-L171 and it could be that the default kernel parameters are problematic with respect to your GPU. If this is the problem, you could try to change https://github.com/intel-isl/StableViewSynthesis/blob/main/ext/mytorch/include/common_cuda.h#L109 to a smaller number.

KaLiMaLi555 commented 3 years ago

Hey @griegler, I tried setting CUDA_NUM_THREADS to a smaller value. I also tried changing the nvcc flags in setup.py. It didn't help at all. It would be great if you can suggest some other fix for this issue

griegler commented 3 years ago

@KaLiMaLi555 do you have more information, e.g., error log. Can you post also the command that you execute.

KaLiMaLi555 commented 3 years ago

I ran the cmd which was provided in the README python exp.py --net resunet3.16_penone.dirs.avg.seq+9+1+unet+5+2+16.single+mlpdir+mean+3+64+16 --cmd eval --iter last --eval-dsets tat-subseq

Library versions:

torch==1.6.0
torch-geometric==1.7.1
torch-scatter==2.0.5
torch-sparse==0.6.8
torchvision==0.7.0

I wasn't able to run the code with some versions of these libs given in the README. These versions seemed to work for me

Error log:

[2021-06-25/05:51/INFO/mytorch] Set seed to 42
[2021-06-25/05:51/INFO/mytorch] ================================================================================
[2021-06-25/05:51/INFO/mytorch] Start cmd "eval": tat-wo-val_bs1_nbs3_rpointdir_s0.25_resunet3.16_penone.dirs.avg.seq+9+1+unet+5+2+16.single+mlpdir+mean+3+64+16_vgg
[2021-06-25/05:51/INFO/mytorch] 2021-06-25 05:51:01
[2021-06-25/05:51/INFO/mytorch] host: ip-172-31-44-59
[2021-06-25/05:51/INFO/mytorch] --------------------------------------------------------------------------------
[2021-06-25/05:51/INFO/mytorch] worker env:
    experiments_root: experiments
    experiment_name: tat-wo-val_bs1_nbs3_rpointdir_s0.25_resunet3.16_penone.dirs.avg.seq+9+1+unet+5+2+16.single+mlpdir+mean+3+64+16_vgg
    n_train_iters: -65536
    seed: 42
    train_batch_size: 1
    train_batch_acc_steps: 1
    eval_batch_size: 1
    num_workers: 6
    save_frequency: <co.mytorch.Frequency object at 0x7fd6472f4a50>
    eval_frequency: <co.mytorch.Frequency object at 0x7fd64f6a8910>
    train_device: cuda:0
    eval_device: cuda:0
    clip_gradient_value: None
    clip_gradient_norm: None
    empty_cache_per_batch: False
    log_debug: []
    train_iter_messages: []
    stopwatch:
    train_dsets: ['tat-wo-val']
    eval_dsets: ['tat-subseq']
    train_n_nbs: 3
    train_src_mode: image
    train_nbs_mode: argmax
    train_scale: 0.25
    eval_scale: 0.5
    invalid_depth: 1000000000.0
    point_aux_data: ['dirs']
    point_edges_mode: penone
    eval_n_max_sources: 5
    train_rank_mode: pointdir
    eval_rank_mode: pointdir
    train_loss: VGGPerceptualLoss(
  (vgg): Sequential(
    (0): Conv2d(3, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
    (1): ReLU(inplace=True)
    (2): Conv2d(64, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
    (3): ReLU(inplace=True)
    (4): MaxPool2d(kernel_size=2, stride=2, padding=0, dilation=1, ceil_mode=False)
    (5): Conv2d(64, 128, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
    (6): ReLU(inplace=True)
    (7): Conv2d(128, 128, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
    (8): ReLU(inplace=True)
    (9): MaxPool2d(kernel_size=2, stride=2, padding=0, dilation=1, ceil_mode=False)
    (10): Conv2d(128, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
    (11): ReLU(inplace=True)
    (12): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
    (13): ReLU(inplace=True)
    (14): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
    (15): ReLU(inplace=True)
    (16): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
    (17): ReLU(inplace=True)
    (18): MaxPool2d(kernel_size=2, stride=2, padding=0, dilation=1, ceil_mode=False)
    (19): Conv2d(256, 512, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
    (20): ReLU(inplace=True)
    (21): Conv2d(512, 512, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
    (22): ReLU(inplace=True)
    (23): Conv2d(512, 512, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
    (24): ReLU(inplace=True)
    (25): Conv2d(512, 512, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
    (26): ReLU(inplace=True)
    (27): MaxPool2d(kernel_size=2, stride=2, padding=0, dilation=1, ceil_mode=False)
    (28): Conv2d(512, 512, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
    (29): ReLU(inplace=True)
    (30): Conv2d(512, 512, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
    (31): ReLU(inplace=True)
    (32): Conv2d(512, 512, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
    (33): ReLU(inplace=True)
    (34): Conv2d(512, 512, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
    (35): ReLU(inplace=True)
    (36): MaxPool2d(kernel_size=2, stride=2, padding=0, dilation=1, ceil_mode=False)
  )
)
    eval_loss: L1Loss()
    exp_out_root: experiments/tat-wo-val_bs1_nbs3_rpointdir_s0.25_resunet3.16_penone.dirs.avg.seq+9+1+unet+5+2+16.single+mlpdir+mean+3+64+16_vgg
    db_path: experiments/tat-wo-val_bs1_nbs3_rpointdir_s0.25_resunet3.16_penone.dirs.avg.seq+9+1+unet+5+2+16.single+mlpdir+mean+3+64+16_vgg/exp.ip-172-31-44-59.db
    db_logger: <co.sqlite.Logger object at 0x7fd640347590>
[2021-06-25/05:51/INFO/mytorch] ================================================================================
[2021-06-25/05:51/INFO/exp] Create eval datasets
[2021-06-25/05:51/INFO/exp]   create dataset for tat_subseq_training_Truck
[2021-06-25/05:51/INFO/dataset]     #tgt_im_paths=25, #tgt_counts=(25, 226), tgt_im=(3, 576, 992), tgt_dm=(576, 992), train=False
[2021-06-25/05:51/INFO/exp]   create dataset for tat_subseq_intermediate_M60
[2021-06-25/05:51/INFO/dataset]     #tgt_im_paths=36, #tgt_counts=(36, 277), tgt_im=(3, 576, 1088), tgt_dm=(576, 1088), train=False
[2021-06-25/05:51/INFO/exp]   create dataset for tat_subseq_intermediate_Playground
[2021-06-25/05:51/INFO/dataset]     #tgt_im_paths=32, #tgt_counts=(32, 275), tgt_im=(3, 576, 1024), tgt_dm=(576, 1024), train=False
[2021-06-25/05:51/INFO/exp]   create dataset for tat_subseq_intermediate_Train
[2021-06-25/05:51/INFO/dataset]     #tgt_im_paths=43, #tgt_counts=(43, 258), tgt_im=(3, 576, 992), tgt_dm=(576, 992), train=False
[2021-06-25/05:51/INFO/modules] [NET][EncNet] resunet3.16
[2021-06-25/05:51/INFO/modules] [NET][RefNet] point_edges_mode=penone
[2021-06-25/05:51/INFO/modules] [NET][RefNet] point_aux_data=dirs
[2021-06-25/05:51/INFO/modules] [NET][RefNet] point_avg_mode=avg
[2021-06-25/05:51/INFO/modules] [NET][RefNet] Seq 9 nets, nets_residual=True
[2021-06-25/05:51/INFO/modules] [NET][RefNet]   Unet(in_channels=16, enc_channels=[16, 32, 64, 128, 128], dec_channels=[128, 64, 32, 16], n_conv=2)
[2021-06-25/05:51/INFO/modules] [NET][RefNet]   Unet(in_channels=16, enc_channels=[16, 32, 64, 128, 128], dec_channels=[128, 64, 32, 16], n_conv=2)
[2021-06-25/05:51/INFO/modules] [NET][RefNet]   Unet(in_channels=16, enc_channels=[16, 32, 64, 128, 128], dec_channels=[128, 64, 32, 16], n_conv=2)
[2021-06-25/05:51/INFO/modules] [NET][RefNet]   Unet(in_channels=16, enc_channels=[16, 32, 64, 128, 128], dec_channels=[128, 64, 32, 16], n_conv=2)
[2021-06-25/05:51/INFO/modules] [NET][RefNet]   Unet(in_channels=16, enc_channels=[16, 32, 64, 128, 128], dec_channels=[128, 64, 32, 16], n_conv=2)
[2021-06-25/05:51/INFO/modules] [NET][RefNet]   Unet(in_channels=16, enc_channels=[16, 32, 64, 128, 128], dec_channels=[128, 64, 32, 16], n_conv=2)
[2021-06-25/05:51/INFO/modules] [NET][RefNet]   Unet(in_channels=16, enc_channels=[16, 32, 64, 128, 128], dec_channels=[128, 64, 32, 16], n_conv=2)
[2021-06-25/05:51/INFO/modules] [NET][RefNet]   Unet(in_channels=16, enc_channels=[16, 32, 64, 128, 128], dec_channels=[128, 64, 32, 16], n_conv=2)
[2021-06-25/05:51/INFO/modules] [NET][RefNet]   Unet(in_channels=16, enc_channels=[16, 32, 64, 128, 128], dec_channels=[128, 64, 32, 16], n_conv=2)
[2021-06-25/05:51/INFO/modules] [NET][RefNet] Single gnn
[2021-06-25/05:51/INFO/modules] [NET][RefNet]   MLPDir(in_channels=16, hidden_channels=64, n_mods=3, out_channels=16, aggr=mean)
[2021-06-25/05:51/INFO/modules] [NET][RefNet] out_conv(16, 3)
[2021-06-25/05:51/INFO/mytorch] [EVAL] loading net for iter last: experiments/tat-wo-val_bs1_nbs3_rpointdir_s0.25_resunet3.16_penone.dirs.avg.seq+9+1+unet+5+2+16.single+mlpdir+mean+3+64+16_vgg/net_0000000000000000.params
[2021-06-25/05:51/INFO/mytorch]
[2021-06-25/05:51/INFO/mytorch] ================================================================================
[2021-06-25/05:51/INFO/mytorch] Evaluating set tat_subseq_training_Truck
[2021-06-25/05:51/INFO/exp] --------------------------------------------------------------------------------
[2021-06-25/05:51/INFO/mytorch] 2021-06-25 05:51:04
[2021-06-25/05:51/INFO/exp] Eval iter 0
[2021-06-25/05:51/INFO/exp]   preprocess all source images
[2021-06-25/05:51/INFO/exp]     feat tmp dir: experiments/tmp_srcfeat_tat-wo-val_bs1_nbs3_rpointdir_s0.25_resunet3.16_penone.dirs.avg.seq+9+1+unet+5+2+16.single+mlpdir+mean+3+64+16_vgg_tat_subseq_training_Truck
[2021-06-25/05:51/INFO/exp]   create target images
invalid device function in /home/ubuntu/PreImage/StableViewSynthesis/ext/mytorch/include/common_cuda.h at 171
[1]    31933 segmentation fault (core dumped)  python exp.py --net  --cmd eval --iter last --eval-dsets tat-subseq

alex04072000 commented 3 years ago

@MatveyMor Did you solve the customized dataset issue? I am facing the same problem. There is no script for generating delaunay_photometric.ply in create_data_own.py.

isl-org / StableViewSynthesis

Customized Dataset for Stable View Synthesis & CUDA Error #12