zhyever / PatchFusion

[CVPR 2024] An End-to-End Tile-Based Framework for High-Resolution Monocular Metric Depth Estimation
https://zhyever.github.io/patchfusion/
MIT License
958 stars 64 forks source link

Error on Google Github File ./infer_user.py", line 1046, in <module> model = load_ckpt(model, args.ckp_path) #14

Closed herrartist closed 6 months ago

herrartist commented 9 months ago

Hi, trying to run it on Google Colab but I get this error below. Any solutions?

Using cache found in /root/.cache/torch/hub/intel-isl_MiDaS_master Traceback (most recent call last): File "/content/drive/MyDrive/PatchFusion-main/./infer_user.py", line 1046, in model = load_ckpt(model, args.ckp_path) File "/content/drive/MyDrive/PatchFusion-main/./infer_user.py", line 82, in load_ckpt model = load_wts(model, checkpoint) File "/content/drive/MyDrive/PatchFusion-main/./infer_user.py", line 79, in load_wts return load_state_dict(model, ckpt) File "/content/drive/MyDrive/PatchFusion-main/./infer_user.py", line 71, in load_state_dict model.load_state_dict(state, strict=True) File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 2152, in load_state_dict raise RuntimeError('Error(s) in loading state_dict for {}:\n\t{}'.format( RuntimeError: Error(s) in loading state_dict for PatchFusion: Unexpected key(s) in state_dict: "coarse_model.core.core.pretrained.model.blocks.0.attn.relative_position_index", "coarse_model.core.core.pretrained.model.blocks.1.attn.relative_position_index", "coarse_model.core.core.pretrained.model.blocks.2.attn.relative_position_index", "coarse_model.core.core.pretrained.model.blocks.3.attn.relative_position_index", "coarse_model.core.core.pretrained.model.blocks.4.attn.relative_position_index", "coarse_model.core.core.pretrained.model.blocks.5.attn.relative_position_index", "coarse_model.core.core.pretrained.model.blocks.6.attn.relative_position_index", "coarse_model.core.core.pretrained.model.blocks.7.attn.relative_position_index", "coarse_model.core.core.pretrained.model.blocks.8.attn.relative_position_index", "coarse_model.core.core.pretrained.model.blocks.9.attn.relative_position_index", "coarse_model.core.core.pretrained.model.blocks.10.attn.relative_position_index", "coarse_model.core.core.pretrained.model.blocks.11.attn.relative_position_index", "coarse_model.core.core.pretrained.model.blocks.12.attn.relative_position_index", "coarse_model.core.core.pretrained.model.blocks.13.attn.relative_position_index", "coarse_model.core.core.pretrained.model.blocks.14.attn.relative_position_index", "coarse_model.core.core.pretrained.model.blocks.15.attn.relative_position_index", "coarse_model.core.core.pretrained.model.blocks.16.attn.relative_position_index", "coarse_model.core.core.pretrained.model.blocks.17.attn.relative_position_index", "coarse_model.core.core.pretrained.model.blocks.18.attn.relative_position_index", "coarse_model.core.core.pretrained.model.blocks.19.attn.relative_position_index", "coarse_model.core.core.pretrained.model.blocks.20.attn.relative_position_index", "coarse_model.core.core.pretrained.model.blocks.21.attn.relative_position_index", "coarse_model.core.core.pretrained.model.blocks.22.attn.relative_position_index", "coarse_model.core.core.pretrained.model.blocks.23.attn.relative_position_index", "fine_model.core.core.pretrained.model.blocks.0.attn.relative_position_index", "fine_model.core.core.pretrained.model.blocks.1.attn.relative_position_index", "fine_model.core.core.pretrained.model.blocks.2.attn.relative_position_index", "fine_model.core.core.pretrained.model.blocks.3.attn.relative_position_index", "fine_model.core.core.pretrained.model.blocks.4.attn.relative_position_index", "fine_model.core.core.pretrained.model.blocks.5.attn.relative_position_index", "fine_model.core.core.pretrained.model.blocks.6.attn.relative_position_index", "fine_model.core.core.pretrained.model.blocks.7.attn.relative_position_index", "fine_model.core.core.pretrained.model.blocks.8.attn.relative_position_index", "fine_model.core.core.pretrained.model.blocks.9.attn.relative_position_index", "fine_model.core.core.pretrained.model.blocks.10.attn.relative_position_index", "fine_model.core.core.pretrained.model.blocks.11.attn.relative_position_index", "fine_model.core.core.pretrained.model.blocks.12.attn.relative_position_index", "fine_model.core.core.pretrained.model.blocks.13.attn.relative_position_index", "fine_model.core.core.pretrained.model.blocks.14.attn.relative_position_index", "fine_model.core.core.pretrained.model.blocks.15.attn.relative_position_index", "fine_model.core.core.pretrained.model.blocks.16.attn.relative_position_index", "fine_model.core.core.pretrained.model.blocks.17.attn.relative_position_index", "fine_model.core.core.pretrained.model.blocks.18.attn.relative_position_index", "fine_model.core.core.pretrained.model.blocks.19.attn.relative_position_index", "fine_model.core.core.pretrained.model.blocks.20.attn.relative_position_index", "fine_model.core.core.pretrained.model.blocks.21.attn.relative_position_index", "fine_model.core.core.pretrained.model.blocks.22.attn.relative_position_index", "fine_model.core.core.pretrained.model.blocks.23.attn.relative_position_index".

zhyever commented 9 months ago

I was wondering if you successfully solved this issue. I didn't test our codes on colab

andybak commented 8 months ago

@SoftologyPro reported the same error trying to run this in Windows (in a message on Discord)

zhyever commented 8 months ago

I think it's related to the timm version. Could you please share your environment setup so that I would try to look for a solution

SoftologyPro commented 8 months ago

For me, these are the pip commands I used to setup the environment.

python -m pip install --upgrade pip
pip install --no-cache-dir --ignore-installed --force-reinstall --no-warn-conflicts wheel==0.41.2
pip install --no-cache-dir --ignore-installed --force-reinstall --no-warn-conflicts accelerate==0.25.0
pip install --no-cache-dir --ignore-installed --force-reinstall --no-warn-conflicts gradio==4.12.0
pip install --no-cache-dir --ignore-installed --force-reinstall --no-warn-conflicts trimesh==3.20.2
pip install --no-cache-dir --ignore-installed --force-reinstall --no-warn-conflicts json5==0.9.14
pip install --no-cache-dir --ignore-installed --force-reinstall --no-warn-conflicts opencv-python==4.9.0.80
pip install --no-cache-dir --ignore-installed --force-reinstall --no-warn-conflicts scipy==1.11.4
pip install --no-cache-dir --ignore-installed --force-reinstall --no-warn-conflicts imageio==2.33.1
pip install --no-cache-dir --ignore-installed --force-reinstall --no-warn-conflicts timm==0.9.12
pip install --no-cache-dir --ignore-installed --force-reinstall --no-warn-conflicts einops==0.7.0
pip install --no-cache-dir --ignore-installed --force-reinstall --no-warn-conflicts transformers==4.36.2
pip install --no-cache-dir --ignore-installed --force-reinstall --no-warn-conflicts open_clip_torch==2.0.2
pip install --no-cache-dir --ignore-installed --force-reinstall --no-warn-conflicts pytorch-lightning==1.5.0
pip install --no-cache-dir --ignore-installed --force-reinstall --no-warn-conflicts omegaconf==2.1.1
pip uninstall -y torch
pip install --no-cache-dir --ignore-installed --force-reinstall --no-warn-conflicts torch==2.1.1+cu118 torchvision torchaudio --index-url https://download.pytorch.org/whl/cu118
pip uninstall -y charset-normalizer
pip install --no-cache-dir --ignore-installed --force-reinstall --no-warn-conflicts charset-normalizer==3.3.2
pip uninstall -y typing_extensions
pip install --no-cache-dir --ignore-installed --force-reinstall --no-warn-conflicts typing_extensions==4.8.0

and here is the full output

D:\Tests\PatchFusion>python ./ui_generative.py --model zoedepth_custom --ckp_path nfs/patchfusion_u4k.pt --model_cfg_path ./zoedepth/models/zoedepth_custom/configs/config_zoedepth_patchfusion.json
logging improved.
build pretrained condition model from None
img_size [384, 512]
Using cache found in C:\Users\Jason/.cache\torch\hub\intel-isl_MiDaS_master
D:\Tests\PatchFusion\voc_patchfusion\lib\site-packages\torch\functional.py:504: UserWarning: torch.meshgrid: in an upcoming release, it will be required to pass the indexing argument. (Triggered internally at ..\aten\src\ATen\native\TensorShape.cpp:3527.)
  return _VF.meshgrid(tensors, **kwargs)  # type: ignore[attr-defined]
build pretrained condition model from None
img_size [384, 512]
Using cache found in C:\Users\Jason/.cache\torch\hub\intel-isl_MiDaS_master
Traceback (most recent call last):
  File "D:\Tests\PatchFusion\ui_generative.py", line 44, in <module>
    from ui_prediction import predict_depth
  File "D:\Tests\PatchFusion\ui_prediction.py", line 119, in <module>
    model = load_ckpt(model, args.ckp_path)
  File "D:\Tests\PatchFusion\ui_prediction.py", line 102, in load_ckpt
    model = load_wts(model, checkpoint)
  File "D:\Tests\PatchFusion\ui_prediction.py", line 99, in load_wts
    return load_state_dict(model, ckpt)
  File "D:\Tests\PatchFusion\ui_prediction.py", line 93, in load_state_dict
    model.load_state_dict(state, strict=True)
  File "D:\Tests\PatchFusion\voc_patchfusion\lib\site-packages\torch\nn\modules\module.py", line 2152, in load_state_dict
    raise RuntimeError('Error(s) in loading state_dict for {}:\n\t{}'.format(
RuntimeError: Error(s) in loading state_dict for PatchFusion:
        Unexpected key(s) in state_dict: "coarse_model.core.core.pretrained.model.blocks.0.attn.relative_position_index", "coarse_model.core.core.pretrained.model.blocks.1.attn.relative_position_index", "coarse_model.core.core.pretrained.model.blocks.2.attn.relative_position_index", "coarse_model.core.core.pretrained.model.blocks.3.attn.relative_position_index", "coarse_model.core.core.pretrained.model.blocks.4.attn.relative_position_index", "coarse_model.core.core.pretrained.model.blocks.5.attn.relative_position_index", "coarse_model.core.core.pretrained.model.blocks.6.attn.relative_position_index", "coarse_model.core.core.pretrained.model.blocks.7.attn.relative_position_index", "coarse_model.core.core.pretrained.model.blocks.8.attn.relative_position_index", "coarse_model.core.core.pretrained.model.blocks.9.attn.relative_position_index", "coarse_model.core.core.pretrained.model.blocks.10.attn.relative_position_index", "coarse_model.core.core.pretrained.model.blocks.11.attn.relative_position_index", "coarse_model.core.core.pretrained.model.blocks.12.attn.relative_position_index", "coarse_model.core.core.pretrained.model.blocks.13.attn.relative_position_index", "coarse_model.core.core.pretrained.model.blocks.14.attn.relative_position_index", "coarse_model.core.core.pretrained.model.blocks.15.attn.relative_position_index", "coarse_model.core.core.pretrained.model.blocks.16.attn.relative_position_index", "coarse_model.core.core.pretrained.model.blocks.17.attn.relative_position_index", "coarse_model.core.core.pretrained.model.blocks.18.attn.relative_position_index", "coarse_model.core.core.pretrained.model.blocks.19.attn.relative_position_index", "coarse_model.core.core.pretrained.model.blocks.20.attn.relative_position_index", "coarse_model.core.core.pretrained.model.blocks.21.attn.relative_position_index", "coarse_model.core.core.pretrained.model.blocks.22.attn.relative_position_index", "coarse_model.core.core.pretrained.model.blocks.23.attn.relative_position_index", "fine_model.core.core.pretrained.model.blocks.0.attn.relative_position_index", "fine_model.core.core.pretrained.model.blocks.1.attn.relative_position_index", "fine_model.core.core.pretrained.model.blocks.2.attn.relative_position_index", "fine_model.core.core.pretrained.model.blocks.3.attn.relative_position_index", "fine_model.core.core.pretrained.model.blocks.4.attn.relative_position_index", "fine_model.core.core.pretrained.model.blocks.5.attn.relative_position_index", "fine_model.core.core.pretrained.model.blocks.6.attn.relative_position_index", "fine_model.core.core.pretrained.model.blocks.7.attn.relative_position_index", "fine_model.core.core.pretrained.model.blocks.8.attn.relative_position_index", "fine_model.core.core.pretrained.model.blocks.9.attn.relative_position_index", "fine_model.core.core.pretrained.model.blocks.10.attn.relative_position_index", "fine_model.core.core.pretrained.model.blocks.11.attn.relative_position_index", "fine_model.core.core.pretrained.model.blocks.12.attn.relative_position_index", "fine_model.core.core.pretrained.model.blocks.13.attn.relative_position_index", "fine_model.core.core.pretrained.model.blocks.14.attn.relative_position_index", "fine_model.core.core.pretrained.model.blocks.15.attn.relative_position_index", "fine_model.core.core.pretrained.model.blocks.16.attn.relative_position_index", "fine_model.core.core.pretrained.model.blocks.17.attn.relative_position_index", "fine_model.core.core.pretrained.model.blocks.18.attn.relative_position_index", "fine_model.core.core.pretrained.model.blocks.19.attn.relative_position_index", "fine_model.core.core.pretrained.model.blocks.20.attn.relative_position_index", "fine_model.core.core.pretrained.model.blocks.21.attn.relative_position_index", "fine_model.core.core.pretrained.model.blocks.22.attn.relative_position_index", "fine_model.core.core.pretrained.model.blocks.23.attn.relative_position_index".

Rolling back timm to 0.6.13 got past that error and got the UI started. But then when I select an image and click Submit, it shows error. These are the stats.

Traceback (most recent call last):
  File "D:\Tests\PatchFusion\voc_patchfusion\lib\site-packages\gradio\queueing.py", line 489, in call_prediction
    output = await route_utils.call_process_api(
  File "D:\Tests\PatchFusion\voc_patchfusion\lib\site-packages\gradio\route_utils.py", line 232, in call_process_api
    output = await app.get_blocks().process_api(
  File "D:\Tests\PatchFusion\voc_patchfusion\lib\site-packages\gradio\blocks.py", line 1561, in process_api
    result = await self.call_function(
  File "D:\Tests\PatchFusion\voc_patchfusion\lib\site-packages\gradio\blocks.py", line 1179, in call_function
    prediction = await anyio.to_thread.run_sync(
  File "D:\Tests\PatchFusion\voc_patchfusion\lib\site-packages\anyio\to_thread.py", line 56, in run_sync
    return await get_async_backend().run_sync_in_worker_thread(
  File "D:\Tests\PatchFusion\voc_patchfusion\lib\site-packages\anyio\_backends\_asyncio.py", line 2134, in run_sync_in_worker_thread
    return await future
  File "D:\Tests\PatchFusion\voc_patchfusion\lib\site-packages\anyio\_backends\_asyncio.py", line 851, in run
    result = context.run(func, *args)
  File "D:\Tests\PatchFusion\voc_patchfusion\lib\site-packages\gradio\utils.py", line 678, in wrapper
    response = f(*args, **kwargs)
  File "D:\Tests\PatchFusion\ui_prediction.py", line 297, in on_submit
    depth = predict_depth(model, image, mode, pn, reso, ps)
  File "D:\Tests\PatchFusion\ui_prediction.py", line 155, in predict_depth
    image_lr = preprocess(image)
  File "D:\Tests\PatchFusion\voc_patchfusion\lib\site-packages\torchvision\transforms\transforms.py", line 95, in __call__
    img = t(img)
  File "D:\Tests\PatchFusion\zoedepth\models\base_models\midas.py", line 173, in __call__
    return nn.functional.interpolate(x, (height, width), mode='bilinear', align_corners=True)
  File "D:\Tests\PatchFusion\voc_patchfusion\lib\site-packages\torch\nn\functional.py", line 3924, in interpolate
    raise TypeError(
TypeError: expected size to be one of int or Tuple[int] or Tuple[int, int] or Tuple[int, int, int], but got size with types [<class 'numpy.int32'>, <class 'numpy.int32'>]

So, getting closer to working.

zhyever commented 8 months ago

Yeah, please refer to https://github.com/zhyever/PatchFusion/issues/13. First please downgrade the timm version.

As for TypeError: expected size to be one of int or Tuple[int] or Tuple[int, int] or Tuple[int, int, int], but got size with types [<class 'numpy.int32'>, <class 'numpy.int32'>].

Please change this line nn.functional.interpolate(x, (height, width), mode='bilinear', align_corners=True) to nn.functional.interpolate(x, (int(height), int(width)), mode='bilinear', align_corners=True). You might use torch 2.0+

SoftologyPro commented 8 months ago

Rolling Torch back to 2.0.1 got it all working without any code changes. For Windows users, here are the pip install commands I used to setup the environment (Cuda 11.8)

python -m pip install --upgrade pip
pip install --no-cache-dir --ignore-installed --force-reinstall --no-warn-conflicts wheel==0.41.2
pip install --no-cache-dir --ignore-installed --force-reinstall --no-warn-conflicts accelerate==0.25.0
pip install --no-cache-dir --ignore-installed --force-reinstall --no-warn-conflicts gradio==4.12.0
pip install --no-cache-dir --ignore-installed --force-reinstall --no-warn-conflicts trimesh==3.20.2
pip install --no-cache-dir --ignore-installed --force-reinstall --no-warn-conflicts json5==0.9.14
pip install --no-cache-dir --ignore-installed --force-reinstall --no-warn-conflicts opencv-python==4.9.0.80
pip install --no-cache-dir --ignore-installed --force-reinstall --no-warn-conflicts scipy==1.11.4
pip install --no-cache-dir --ignore-installed --force-reinstall --no-warn-conflicts imageio==2.33.1
pip install --no-cache-dir --ignore-installed --force-reinstall --no-warn-conflicts timm==0.6.13
pip install --no-cache-dir --ignore-installed --force-reinstall --no-warn-conflicts einops==0.7.0
pip install --no-cache-dir --ignore-installed --force-reinstall --no-warn-conflicts transformers==4.36.2
pip install --no-cache-dir --ignore-installed --force-reinstall --no-warn-conflicts open_clip_torch==2.0.2
pip install --no-cache-dir --ignore-installed --force-reinstall --no-warn-conflicts pytorch-lightning==1.5.0
pip install --no-cache-dir --ignore-installed --force-reinstall --no-warn-conflicts omegaconf==2.1.1
pip uninstall -y torch
pip install --no-cache-dir --ignore-installed --force-reinstall --no-warn-conflicts torch==2.0.1+cu118 torchvision==0.15.2+cu118 --index-url https://download.pytorch.org/whl/cu118
pip uninstall -y charset-normalizer
pip install --no-cache-dir --ignore-installed --force-reinstall --no-warn-conflicts charset-normalizer==3.3.2
pip uninstall -y typing_extensions
pip install --no-cache-dir --ignore-installed --force-reinstall --no-warn-conflicts typing_extensions==4.8.0

There are a few warnings when Submit is clicked, but it does work now.

D:\Tests\PatchFusion\infer_user.py:530: UserWarning: To copy construct from a tensor, it is recommended to use sourceTensor.clone().detach() or sourceTensor.clone().detach().requires_grad_(True), rather than torch.tensor(sourceTensor).
  blur_mask = torch.tensor(blur_mask, device=whole_depth_pred.device)
D:\Tests\PatchFusion\infer_user.py:537: UserWarning: To copy construct from a tensor, it is recommended to use sourceTensor.clone().detach() or sourceTensor.clone().detach().requires_grad_(True), rather than torch.tensor(sourceTensor).
  blur_mask = torch.tensor(blur_mask, device=whole_depth_pred.device)
D:\Tests\PatchFusion\ui_prediction.py:134: MatplotlibDeprecationWarning: The get_cmap function was deprecated in Matplotlib 3.7 and will be removed two minor releases later. Use ``matplotlib.colormaps[name]`` or ``matplotlib.colormaps.get_cmap(obj)`` instead.
  cmapper = matplotlib.cm.get_cmap(cmap)