zezeaaa / MVPGS

đź“‘[ECCV'2024]MVPGS: Excavating Multi-view Priors for Gaussian Splatting from Sparse Input Views
https://zezeaaa.github.io/projects/MVPGS/
Other
32 stars 0 forks source link

RuntimeError: upsample_bilinear2d_nhwc only supports output tensors with less than INT_MAX elements #1

Open goometasoft opened 2 months ago

goometasoft commented 2 months ago
windows-10 , cuda-12.1 , pytorch-2.2

python train.py --input_views 100 ^
-s E:\AI\test\MVPGS ^
-m E:\AI\test\MVPGS\output ^
--dataset LLFF --stage train --densify_until_iter 5000 --total_virtual_num 360  ^
-r 1 --mvs_config mvs_modules/configs/config_mvsformer.json

Optimizing E:\AI\test\MVPGS\output
Output folder: E:\AI\test\MVPGS\output [26/09 11:40:59]
Reading camera 100/100 [26/09 11:40:59]
Dataset:  LLFF [26/09 11:40:59]
Eval LLFF Dataset!!! [26/09 11:40:59]
Predicting Mono depth... [26/09 11:40:59]
Using cache found in C:\Users\Administrator/.cache\torch\hub\intel-isl_MiDaS_master
D:\conda\envs\cuda121\lib\site-packages\timm\models\_factory.py:117: UserWarning: Mapping deprecated model name vit_base_resnet50_384 to current vit_base_r50_s16_384.orig_in21k_ft_in1k.
  model = create_fn(
Using cache found in C:\Users\Administrator/.cache\torch\hub\intel-isl_MiDaS_master
D:\conda\envs\cuda121\lib\site-packages\timm\models\vision_transformer.py:91: UserWarning: 1Torch was not compiled with flash attention. (Triggered internally at C:\actions-runner\_work\pytorch\pytorch\builder\windows\pytorch\aten\src\ATen\native\transformers\cuda\sdp_utils.cpp:455.)
  x = F.scaled_dot_product_attention(
Traceback (most recent call last):

  File "E:\AI\A4A\240925\win_121\train.py", line 265, in <module>
    training(lp.extract(args), op.extract(args), pp.extract(args), args.test_iterations, args.save_iterations, args.checkpoint_iterations, args.start_checkpoint, args.debug_from)

  File "E:\AI\A4A\240925\win_121\train.py", line 38, in training
    scene = Scene(dataset, gaussians, stage=pipe.stage)

  File "E:\AI\A4A\240925\win_121\scene\__init__.py", line 56, in __init__
    scene_info = sceneLoadTypeCallbacks["Colmap"](args.source_path, args.images, args.eval,

  File "E:\AI\A4A\240925\win_121\scene\dataset_readers.py", line 239, in readColmapSceneInfo
    mono_depths = get_mono_depth(imgs)

  File "E:\AI\A4A\240925\win_121\scene\dataset_readers.py", line 386, in get_mono_depth
    prediction = midas(imgs)

  File "D:\conda\envs\cuda121\lib\site-packages\torch\nn\modules\module.py", line 1532, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)

  File "D:\conda\envs\cuda121\lib\site-packages\torch\nn\modules\module.py", line 1541, in _call_impl
    return forward_call(*args, **kwargs)

  File "C:\Users\Administrator/.cache\torch\hub\intel-isl_MiDaS_master\midas\dpt_depth.py", line 166, in forward
    return super().forward(x).squeeze(dim=1)

  File "C:\Users\Administrator/.cache\torch\hub\intel-isl_MiDaS_master\midas\dpt_depth.py", line 137, in forward
    out = self.scratch.output_conv(path_1)

  File "D:\conda\envs\cuda121\lib\site-packages\torch\nn\modules\module.py", line 1532, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)

  File "D:\conda\envs\cuda121\lib\site-packages\torch\nn\modules\module.py", line 1541, in _call_impl
    return forward_call(*args, **kwargs)

  File "D:\conda\envs\cuda121\lib\site-packages\torch\nn\modules\container.py", line 217, in forward
    input = module(input)

  File "D:\conda\envs\cuda121\lib\site-packages\torch\nn\modules\module.py", line 1532, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)

  File "D:\conda\envs\cuda121\lib\site-packages\torch\nn\modules\module.py", line 1541, in _call_impl
    return forward_call(*args, **kwargs)

  File "C:\Users\Administrator/.cache\torch\hub\intel-isl_MiDaS_master\midas\blocks.py", line 236, in forward
    x = self.interp(

  File "D:\conda\envs\cuda121\lib\site-packages\torch\nn\functional.py", line 4065, in interpolate
    return torch._C._nn.upsample_bilinear2d(input, output_size, align_corners, scale_factors)

RuntimeError: upsample_bilinear2d_nhwc only supports output tensors with less than INT_MAX elements
zezeaaa commented 2 months ago

It seems the error is caused by the tensor size exceeding the limit (INT_MAX) during the bilinear interpolation in monocular depth prediction. Could you please try reducing the input image size or using fewer input views to see if that helps?