Onnx build error in Demo ipynb

rachelzh9 commented 1 year ago

I'm getting this build error for onnx in the ipynb:

Building wheels for collected packages: viewformer, onnx
  Building wheel for viewformer (setup.py) ... done
  Created wheel for viewformer: filename=viewformer-0.0.1-py3-none-any.whl size=125182 sha256=0927054eae3adc599285f862a25d93f8f1628c7ace07150b682bd56870c1f9e7
  Stored in directory: /tmp/pip-ephem-wheel-cache-_u0h7mpm/wheels/1d/2a/36/002883d36cc65fdcbb1b83521fd65bf3759367c93f31db008f
  error: subprocess-exited-with-error

  × Building wheel for onnx (pyproject.toml) did not run successfully.
  │ exit code: 1
  ╰─> See above for output.

  note: This error originates from a subprocess, and is likely not a problem with pip.
  Building wheel for onnx (pyproject.toml) ... error
  ERROR: Failed building wheel for onnx
Successfully built viewformer
Failed to build onnx
ERROR: Could not build wheels for onnx, which is required to install pyproject.toml-based projects

Any idea how to fix it?

jkulhanek commented 1 year ago

Related issue in the onnx repo: https://github.com/onnx/onnx/issues/4376 I guess it has to do with Colab switching to newer python.

jkulhanek commented 1 year ago

It should work now; please let me know if you are still experiencing issues.

rachelzh9 commented 1 year ago

I am also experiencing issues when loading the model on line codebook = load_model(os.path.expanduser('~/.cache/viewformer/models/interiornet-codebook-th/model.ckpt')):

Loading model from: /usr/local/lib/python3.9/dist-packages/lpips/weights/v0.1/vgg.pth
/usr/local/lib/python3.9/dist-packages/tensorflow_addons/utils/ensure_tf_install.py:53: UserWarning: Tensorflow Addons supports using Python ops for all Tensorflow versions above or equal to 2.9.0 and strictly below 2.12.0 (nightly versions are not supported). 
 The versions of TensorFlow you are currently using is 2.12.0 and is not supported. 
Some things might work, some things might not.
If you were to encounter a bug, do not file an issue.
If you want to make sure you're using a tested and supported configuration, either change the TensorFlow version or the TensorFlow Addons's version. 
You can find the compatibility matrix in TensorFlow Addon's readme:
https://github.com/tensorflow/addons
  warnings.warn(
Setting up [LPIPS] perceptual loss: trunk [vgg], v[0.1], spatial [off]
Loading model from: /usr/local/lib/python3.9/dist-packages/lpips/weights/v0.1/vgg.pth
/usr/local/lib/python3.9/dist-packages/torch/onnx/utils.py:1109: UserWarning: Provided key 0 for dynamic axes is not a valid input/output name
  warnings.warn("Provided key {} for dynamic axes is not a valid input/output name".format(key))
/usr/local/lib/python3.9/dist-packages/torch/onnx/utils.py:1109: UserWarning: Provided key 1 for dynamic axes is not a valid input/output name
  warnings.warn("Provided key {} for dynamic axes is not a valid input/output name".format(key))
/usr/local/lib/python3.9/dist-packages/torch/onnx/utils.py:1109: UserWarning: Provided key output for dynamic axes is not a valid input/output name
  warnings.warn("Provided key {} for dynamic axes is not a valid input/output name".format(key))
WARNING:absl:`0` is not a valid tf.function parameter name. Sanitizing to `arg_0`.
WARNING:absl:`1` is not a valid tf.function parameter name. Sanitizing to `arg_1`.
---------------------------------------------------------------------------
KeyError                                  Traceback (most recent call last)
[<ipython-input-2-6c3baa09413c>](https://localhost:8080/#) in <cell line: 22>()
     20 from viewformer.utils.tensorflow import load_model
     21 
---> 22 codebook = load_model(os.path.expanduser('~/.cache/viewformer/models/interiornet-codebook-th/model.ckpt'))
     23 transformer = load_model('interiornet-transformer-tf')

38 frames
[/usr/local/lib/python3.9/dist-packages/onnx_tf/handlers/backend/sub.py](https://localhost:8080/#) in tf__args_check(cls, node, **kwargs)
      6         def tf__args_check(cls, node, **kwargs):
      7             with ag__.FunctionScope('args_check', 'fscope', ag__.STD) as fscope:
----> 8                 dtype = ag__.ld(kwargs)['tensor_dict'][ag__.ld(node).inputs[0]].dtype
      9 
     10                 def get_state():

KeyError: in user code:

    File "/usr/local/lib/python3.9/dist-packages/onnx_tf/backend_tf_module.py", line 99, in __call__  *
        output_ops = self.backend._onnx_node_to_tensorflow_op(onnx_node,
    File "/usr/local/lib/python3.9/dist-packages/onnx_tf/backend.py", line 347, in _onnx_node_to_tensorflow_op  *
        return handler.handle(node, tensor_dict=tensor_dict, strict=strict)
    File "/usr/local/lib/python3.9/dist-packages/onnx_tf/handlers/handler.py", line 58, in handle  *
        cls.args_check(node, **kwargs)
    File "/usr/local/lib/python3.9/dist-packages/onnx_tf/handlers/backend/sub.py", line 24, in args_check  *
        dtype = kwargs["tensor_dict"][node.inputs[0]].dtype

    KeyError: '0'

jkulhanek commented 1 year ago

Oh, it looks like the onnx interface has changed. I will look into the issue.

jkulhanek commented 1 year ago

It's a highly annoying issue in ONNX. Locally it works fine. I did a hack to prevent the faulty LPIPS model from being loaded. However, it may be the case that the same problem happens to other people when training. In that case, I guess I would have to replace the LPIPS implementation and publish my own converted pb LPIPS checkpoints.

jkulhanek commented 1 year ago

@rachelzh9 is it working now?

rachelzh9 commented 1 year ago

Yes, it is working now thank you! On another note, I would like to run inference on a few interiornet scenes but the entire dataset is too large. Can I just use the following command by specifying {dataset path} to a single unzipped folder from Interiornet, or do I need to prepare the data in a more specific way?

viewformer-cli evaluate transformer \
    --codebook-model "{codebook model checkpoint}" \
    --transformer-model "{transformer model checkpoint}" \
    --loader-path "{dataset path}" \
    --loader dataset \
    --loader-split test \
    --batch-size 1 \
    --image-size 128 \
    --job-dir . \
    --num-eval-sequences 1000

jkulhanek commented 1 year ago

The easiest way would be to create the InteriorNet data structure, but keep just the 3D scene you need in the HD7 folder. I guess you also need to make empty HD1-6 folders, but I am not sure. After that, use the interiornet loader and point the path to the root directory you just created.

jkulhanek / viewformer

Onnx build error in Demo ipynb #6