sizes = "None" failing for Keras DynamicGraphConvolutionKerasLayer implementations

rlav440 commented 4 years ago

Running Environment

Ubuntu 19.10, TF 2.1, TF graphics built from repo ~14 days ago.

Issue

Documentation for the Keras Layer implementations suggests that the layers can be run with sizes = None, (which is the default). Instead, if sizes = None, or the default is passed, a value error is eventually raised: ValueError: The channel dimension of the inputs should be defined. Found 'None'. This occurs in the _edge_convolution_template function of convolution/graph_convolution ln. 266

Related This is similar in nature to the issue raised in issue 13, but differs in that:

If input sizes are passed, conversion to dense tensor is consuming unfeasible amounts of memory,
it appears that the layers would run with None input at the time of that issue, which is not the case here.

Minimal example reproducing the undesired behaviour

from tensorflow_graphics.nn.layer import graph_convolution as g_nn
import tensorflow as tf
from tensorflow import keras
def min_example():
    input = keras.Input(shape=(1000, 3))

    sparse = tf.sparse.reshape(tf.sparse.eye(1000), shape=[1, 1000, 1000])
    outputs = g_nn.DynamicGraphConvolutionKerasLayer(num_output_channels=64, reduction='weighted', name='EdgeConv0')(
        inputs=[input, sparse],
    )
    model = keras.Model(inputs=[input], outputs=[outputs])

    return model

if __name__ == "__main__":
    min_example()

avneesh-sud commented 4 years ago

hi rlav440@

Providing keras symbolic tensors as inputs to DGCNN layer is under investigation - there are other issues, ref: #322

rlav440 commented 4 years ago

Hi @avneesh-sud - the issue here isn't providing symbolic tensors to the layer: I was under the impression that this was actually a separate issue (that is fixed in the Github version but not available in the 1.0 package on pypi - I'll comment to that effect in the linked issue).

The issue here is that the layer doesn't run on it's default inputs, which according to the documentation it should do - not that it doesn't run at all. If this is run with the PYPI version of the repository won't reproduce the error correctly - you'll need to run this with a semi recent build from this repository.

avneesh-sud commented 4 years ago

Thanks @rlav440 - i tested by specifying a known batch_size to keras.Input. However that also has issues - compatibility of custom layers like DGCNN (with own weights) with symbolic keras tensors needs further investigation.

rlav440 commented 4 years ago

Yep - I'm currently running a working model using these graph layers so I can definitely attest that the layer works if you provide "size" input (and returns good results).

DGCNN as a custom layer is sub-classed from the appropriate Keras.layers.Layer: so I don't think that it having it's own weights is the issue here. There might be some issues that I'm not aware of: but my model does have primarily DGCNN layers in the keras functional API, and runs at different batch sizes: the symbolic tensor itself isn't the issue.

I'll run my minimum example code and provide the salient output here.

rlav440 commented 4 years ago

  File "/home/robin/Masters/Robin-Graph-NN-Pose-from-Model/Example Models/minimal_ex.py", line 11, in min_example
    inputs=[input, sparse],
  File "/home/robin/anaconda3/envs/standard/lib/python3.7/site-packages/tensorflow_core/python/keras/engine/base_layer.py", line 822, in __call__
    outputs = self.call(cast_inputs, *args, **kwargs)
  File "/home/robin/anaconda3/envs/standard/lib/python3.7/site-packages/tensorflow_graphics/nn/layer/graph_convolution.py", line 432, in call
    edge_function_kwargs=kwargs)
  File "/home/robin/anaconda3/envs/standard/lib/python3.7/site-packages/tensorflow_graphics/geometry/convolution/graph_convolution.py", line 266, in edge_convolution_template
    **edge_function_kwargs)
  File "/home/robin/anaconda3/envs/standard/lib/python3.7/site-packages/tensorflow_graphics/nn/layer/graph_convolution.py", line 418, in _edge_convolution
    convolved_features = conv1d_layer(concat_features)
  File "/home/robin/anaconda3/envs/standard/lib/python3.7/site-packages/tensorflow_core/python/keras/engine/base_layer.py", line 748, in __call__
    self._maybe_build(inputs)
  File "/home/robin/anaconda3/envs/standard/lib/python3.7/site-packages/tensorflow_core/python/keras/engine/base_layer.py", line 2116, in _maybe_build
    self.build(input_shapes)
  File "/home/robin/anaconda3/envs/standard/lib/python3.7/site-packages/tensorflow_core/python/keras/layers/convolutional.py", line 148, in build
    input_channel = self._get_input_channel(input_shape)
  File "/home/robin/anaconda3/envs/standard/lib/python3.7/site-packages/tensorflow_core/python/keras/layers/convolutional.py", line 295, in _get_input_channel
    raise ValueError('The channel dimension of the inputs '
ValueError: The channel dimension of the inputs should be defined. Found `None`.

rlav440 commented 4 years ago

@avneesh-sud you are definitely right that the issue is related to the symbolic tensors - it's not a problem with the layer itself per se - but it's just the sizes = None default doesn't work on symbolic input

This code below shows that the layer runs fine with default inputs when we don't pass it undefined tensors. (and also doesn't cause sparse->dense conversions)

from tensorflow_graphics.nn.layer import graph_convolution as g_nn
import tensorflow as tf

test = tf.ones(shape=[1000, 3])
sparse = tf.sparse.reshape(tf.sparse.eye(1000), shape=[1000, 1000])
outputs = g_nn.DynamicGraphConvolutionKerasLayer(num_output_channels=64, reduction='weighted', name='EdgeConv0')(
    inputs=[test, sparse],
)

With this narrowed down, I might have a go at seeing if I can get this running on my end - no promises though

rlav440 commented 4 years ago

With a bit more debugging the issue appears to stem from the definition of x_flat in convolution/graph_convolution.py, using flatten_batch_to_2d

In this case (symbolic tensor, ndims>2, sizes = None) the data is unrolled into x_flat, with shape=(None,None) is the input batch dimension is unknown. I'll have a look at the utility and see if I can work out the fix.

rlav440 commented 4 years ago

So I have a temporary fix - definitely not worthy of a pull request, as it's just a bit too hacked together. Inserting the line:

x_flat = tf.reshape(x_flat, shape=[-1, data.shape[data_ndims-1]])

after calculating x_flat in convolution/graph_convolution.py (currently ln.255 for me) will provide the necessary shape information for the convolution to run without error. Difference here is relying on the known number of dimensions of the object, rather than the apparently dynamic shape call inside flatten_batch_to_2d.

So yeah, this does end up depending on static vs. dynamic shapes, even though all of the necessary information is known for the convolution to occur. I imagine the convolution layer is throwing the complaint because it's coded to handle dynamic batch sizes, rather than what appear as "dynamic" tensor lengths

tensorflow / graphics

sizes = "None" failing for Keras DynamicGraphConvolutionKerasLayer implementations #317

Minimal example reproducing the undesired behaviour