pronobis / libspn-keras

Library for learning and inference with Sum-product Networks utilizing TensorFlow 2.x and Keras
Other
47 stars 9 forks source link

Input shape of [batch, max_input_seq_len, x, y, channels] for DGC-SPN #5

Closed anicolson closed 4 years ago

anicolson commented 4 years ago

Hi guys,

I am working with a five-dimensional input :

[batch, max_input_seq_len, x, y, channels],

and I am feeding this into a DGC-SPN using the functional API instead of the Sequential API.

The reason I am using the functional API is that I am using keras.backend.ctc_batch_cost. It requires four arguments: network_output, network_output_seq_len, labels, labels_seq_len. Therefore, my 'Model' will require multiple inputs. As of now, I am unaware of 'keras.Sequential' facilitating multiple inputs. This is not a problem, just justifying the use of the functional API.

So the problem that I am having is with the input to the network. I have a simple script of my problem at the end. Basically, I am unsure of how to get the DGC-SPN to work with the input size.

Here is the error it throws when I give the DGC-SPN an Input of: inp = Input(shape=(None, x, y, c)):

Traceback (most recent call last):
  File "dgc_spn.py", line 106, in <module>
    dgc_spn = DGCSPN(inp, (x, y, c), n_class)
  File "dgc_spn.py", line 25, in __init__
    )(inp)
  File "/home/aaron/venv/tf2/lib/python3.6/site-packages/tensorflow_core/python/keras/engine/base_layer.py", line 748, in __call__
    self._maybe_build(inputs)
  File "/home/aaron/venv/tf2/lib/python3.6/site-packages/tensorflow_core/python/keras/engine/base_layer.py", line 2116, in _maybe_build
    self.build(input_shapes)
  File "/home/aaron/venv/tf2/lib/python3.6/site-packages/libspn_keras/layers/base_leaf.py", line 22, in build
    if self.dimension_permutation == DimensionPermutation.AUTO \
  File "/home/aaron/venv/tf2/lib/python3.6/site-packages/libspn_keras/dimension_permutation.py", line 18, in infer_dimension_permutation
    "Cannot infer permutation as there are multiple dynamic dimension sizes, provide "
ValueError: Cannot infer permutation as there are multiple dynamic dimension sizes, provide permutation explicitly when instantiating layers.

None is used for the max_inp_seq_len as it is dynamic.

If I set inp = Input(shape=(x, y, c)), I am able to get the same model summary as shown in README.md.

I can see that 'dimension_permutation' for NormalLeaf can be explicitly defined. So, my question is, what should 'dimension_permutation' be set to?

The network is identical to the Sequential API version given in README.md, just repurposed for the functional API.

"""
"""
from libspn_keras.layers import ConvProduct, NormalLeaf, ReshapeSpatialToDense, \
    SpatialLocalSum, DenseSum, RootSum
from tensorflow.keras.initializers import TruncatedNormal

class DGCSPN:
    """
    """
    def __init__(self, inp, inp_shape, n_class):
        """
        """
        sum_kwargs = dict(
            accumulator_initializer=TruncatedNormal(
            stddev=0.5, mean=1.0),
            logspace_accumulators=True
        )
        leaves = NormalLeaf(
            input_shape=inp_shape,
            num_components=16, 
            # dimension_permutation=?,
            location_trainable=True,
            location_initializer=TruncatedNormal(
                stddev=1.0, mean=0.0)
            )(inp)
        prod_1 = ConvProduct( # non-overlapping products.
            depthwise=True, 
            strides=[2, 2], 
            dilations=[1, 1], 
            kernel_size=[2, 2],
            padding='valid'
            )(leaves)
        sum_1 = SpatialLocalSum(num_sums=16, **sum_kwargs)(prod_1)
        prod_2 = ConvProduct( # non-overlapping products.
            depthwise=True, 
            strides=[2, 2], 
            dilations=[1, 1], 
            kernel_size=[2, 2],
            padding='valid'
            )(sum_1)
        sum_2 = SpatialLocalSum(num_sums=32, **sum_kwargs)(prod_2)
        prod_3 = ConvProduct( # overlapping products, starting at dilations [1, 1].
            depthwise=True, 
            strides=[1, 1], 
            dilations=[1, 1], 
            kernel_size=[2, 2],
            padding='full'
            )(sum_2)
        sum_3 = SpatialLocalSum(num_sums=32, **sum_kwargs)(prod_3)
        prod_4 = ConvProduct( # overlapping products, with dilations [2, 2] and full padding.
            depthwise=True, 
            strides=[1, 1], 
            dilations=[2, 2], 
            kernel_size=[2, 2],
            padding='full'
            )(sum_3)
        sum_4 = SpatialLocalSum(num_sums=64, **sum_kwargs)(prod_4)
        prod_5 = ConvProduct( # overlapping products, with dilations [4, 4] and full padding.
            depthwise=True, 
            strides=[1, 1], 
            dilations=[4, 4], 
            kernel_size=[2, 2],
            padding='full'
            )(sum_4)
        sum_6 = SpatialLocalSum(num_sums=64, **sum_kwargs)(prod_5)
        prod_6 = ConvProduct( # overlapping products, with dilations [8, 8] and 'final' padding to combine all scopes
            depthwise=True, 
            strides=[1, 1], 
            dilations=[8, 8], 
            kernel_size=[2, 2],
            padding='final'
            )(sum_6)
        reshape = ReshapeSpatialToDense()(prod_6)
        class_roots = DenseSum(num_sums=n_class, **sum_kwargs)(reshape) # class roots.
        self.outp = RootSum(
            return_weighted_child_logits=True, 
            logspace_accumulators=True, 
            accumulator_initializer=TruncatedNormal(
                stddev=0.0, mean=1.0)
            )(class_roots)

if __name__ == '__main__':
    from tensorflow.keras import Input, Model
    import numpy as np

    x = y = 28 # 
    c = 1 # channels.
    n_class = 29 # number of classes (number characters for acoustic model).

    batch_size = 3
    max_inp_seq_len = 200 # will be dynamic for real system.
    max_tgt_seq_len = 40 # will be dynamic for real system.

    x_train = np.random.rand(batch_size, max_inp_seq_len, x, y, c)
    x_train_len = np.full(batch_size, max_inp_seq_len) # won't be the same for actual problem.
    y_train = np.random.randint(0, n_class, (batch_size, max_inp_seq_len))
    y_train_len = np.full(batch_size, max_tgt_seq_len) # won't be the same for actual problem.

    print("x_train shape: {}".format(x_train.shape))
    print("x_train_len: {}".format(x_train_len))
    print("y_train shape: {}".format(y_train.shape))
    print("y_train_len: {}".format(y_train_len))

    # inp_shape: [batch, max_inp_seq_len, x, y, channels]
    inp = Input(shape=(None, x, y, c))
    dgc_spn = DGCSPN(inp, (x, y, c), n_class)
    spn = Model(inputs=inp, outputs=dgc_spn.outp)
    spn.summary()

    ## TBC --- CTC loss, fit, metrics etc.
anicolson commented 4 years ago

Please ignore this, I am trying to understand how to manage an input that comprises of both batch_size and max_seq_len. I will probably have to merge them together.

anicolson commented 4 years ago

will check if keras.layers.TimeDistributed works.

jostosh commented 4 years ago

Hi Aaron, I guess TimeDistributed will at least give you an SPN over the variables per timestep. Imagine we have an SPN like this:

spn_single_step = tf.keras.Sequential([
  ...
])

Then you could use the TimeDistributed layer for all steps as follows:

spn_all_steps = tf.keras.layers.TimeDistributed(
  spn_single_step, 
  input_shape=(sequence_len, num_rows, num_cols, num_channels)
) 

At least I think it should work this way. Any tf.keras.Model or tf.keras.Sequential is in fact an instance of tf.keras.layer.Layer, so you can simply inserts full models in the first arg of TimeDistributed.

If your sequence length is dynamic, you could use input_shape=(None, num_rows, num_cols, num_channels) of course.

Also, one shortcoming of the above is that there are no inter-timestep connections this way. So basically you end up with a lot of independent class distributions (one for each time step): p(y_0 | x_0), p(y_1 | x_1), p(y_2, x_2) etc.

If you were to go for a dynamic SPN, you would actually be modeling: p(y_0 | x_0), p(y_1 | x_0, x_1), p(y_2 | x_0, x_1, x_2)

This will be a bit harder to accomplish but you could take inspiration from the docs on RNNs in TF2. When going in this direction, you probably need to avoid using TimeDistributed.

anicolson commented 4 years ago

Thanks for the help Jos.

The image that i am using in this case is a patch of a spectrogram, so the network will have a temporal receptive field. My plan is to start with RAT-SPN and DGC-SPN (both with multiple time-steps as input) and then hopefully progress to a dynamic SPN.

Get Outlook for iOShttps://aka.ms/o0ukef


From: Jos van de Wolfshaar notifications@github.com Sent: Saturday, March 14, 2020 4:15:21 AM To: pronobis/libspn-keras libspn-keras@noreply.github.com Cc: Aaron Nicolson aaron.nicolson@griffithuni.edu.au; State change state_change@noreply.github.com Subject: Re: [pronobis/libspn-keras] Input shape of [batch, max_input_seq_len, x, y, channels] for DGC-SPN (#5)

Hi Aaron, I guess TimeDistributed will at least give you an SPN over the variables per timestep. Imagine we have an SPN like this:

spn_single_step = tf.keras.Sequential([ ... ])

Then you could use the TimeDistributed layer for all steps as follows:

spn_all_steps = tf.keras.layers.TimeDistributed( spn_single_step, input_shape=(sequence_len, num_rows, num_cols, num_channels) )

At least I think it should work this way. Any tf.keras.Model or tf.keras.Sequential is in fact an instance of tf.keras.layer.Layer, so you can simply inserts full models in the first arg of TimeDistributed.

If your sequence length is dynamic, you could use input_shape=(None, num_rows, num_cols, num_channels) of course.

Also, one shortcoming of the above is that there are no inter-timestep connections this way. So basically you end up with a lot of independent class distributions (one for each time step): p(y_0 | x_0), p(y_1 | x_1), p(y_2, x_2) etc.

If you were to go for a dynamic SPNhttps://arxiv.org/abs/1511.04412, you would actually be modeling: p(y_0 | x_0), p(y_1 | x_0, x_1), p(y_2 | x_0, x_1, x_2)

This will be a bit harder to accomplish but you could take inspiration from the docs on RNNs in TF2https://www.tensorflow.org/guide/effective_tf2#take_advantage_of_autograph_with_python_control_flow. When going in this direction, you probably need to avoid using TimeDistributed.

— You are receiving this because you modified the open/close state. Reply to this email directly, view it on GitHubhttps://github.com/pronobis/libspn-keras/issues/5#issuecomment-598852079, or unsubscribehttps://github.com/notifications/unsubscribe-auth/AGHGZ7QA6IRMGWUQXIZCXUDRHJZ3TANCNFSM4LG5HT2Q.

anicolson commented 4 years ago

Hi Jos,

Here is the progress so far with using TimeDistributed.

I assume that the (keras) API passes the input shape as a list, which causes 'scope_and_decomp_dims' to be a list:

2020-03-14 07:53:56.506160: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1241] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 6248 MB memory) -> physical GPU (device: 0, name: GeForce GTX 1080 Ti, pci bus id: 0000:05:00.0, compute capability: 6.1)
Traceback (most recent call last):
  File "dgc_spn.py", line 102, in <module>
    spn_all_time_steps = TimeDistributed(spn_single_step)(inp)
  File "/home/aaron/venv/tf2/lib/python3.6/site-packages/tensorflow_core/python/keras/engine/base_layer.py", line 773, in __call__
    outputs = call_fn(cast_inputs, *args, **kwargs)
  File "/home/aaron/venv/tf2/lib/python3.6/site-packages/tensorflow_core/python/keras/layers/wrappers.py", line 270, in call
    output_shape = self.compute_output_shape(input_shape).as_list()
  File "/home/aaron/venv/tf2/lib/python3.6/site-packages/tensorflow_core/python/keras/layers/wrappers.py", line 212, in compute_output_shape
    child_output_shape = self.layer.compute_output_shape(child_input_shape)
  File "/home/aaron/venv/tf2/lib/python3.6/site-packages/tensorflow_core/python/keras/engine/sequential.py", line 292, in compute_output_shape
    shape = layer.compute_output_shape(shape)
  File "/home/aaron/venv/tf2/lib/python3.6/site-packages/libspn_keras/layers/base_leaf.py", line 54, in compute_output_shape
    out_shape = (None,) + scope_and_decomp_dims + (self.num_components,)
TypeError: can only concatenate tuple (not "list") to tuple

Placing tuple(scope_and_decomp_dims) in libspn_keras/layers/base_leaf.py", line 54 seems to fix the problem.

However, another problem occurs at root_sum.py:

Traceback (most recent call last):
  File "dgc_spn.py", line 108, in <module>
    spn = TimeDistributed(spn_single_step)(inp)
  File "/home/aaron/venv/tf2/lib/python3.6/site-packages/tensorflow_core/python/keras/engine/base_layer.py", line 773, in __call__
    outputs = call_fn(cast_inputs, *args, **kwargs)
  File "/home/aaron/venv/tf2/lib/python3.6/site-packages/tensorflow_core/python/keras/layers/wrappers.py", line 270, in call
    output_shape = self.compute_output_shape(input_shape).as_list()
  File "/home/aaron/venv/tf2/lib/python3.6/site-packages/tensorflow_core/python/keras/layers/wrappers.py", line 212, in compute_output_shape
    child_output_shape = self.layer.compute_output_shape(child_input_shape)
  File "/home/aaron/venv/tf2/lib/python3.6/site-packages/tensorflow_core/python/keras/engine/sequential.py", line 292, in compute_output_shape
    shape = layer.compute_output_shape(shape)
  File "/home/aaron/mnt/aaron/Dropbox/Systems/libspn-keras/libspn_keras/layers/root_sum.py", line 125, in compute_output_shape
    num_batch, num_nodes_in = input_shape
ValueError: too many values to unpack (expected 2)

where input_shape is (1,1,None,10).

Interestingly, if I use a 4D input :

inp = Input(shape=(x, y, c)) # no sequence dimension
spn = spn_single_step(inp)
model = Model(inputs=inp, outputs=spn)
model.summary()

the function compute_output_shape() from root_sum.py is not called.

Moreover, I can see that the size of input_shape (1,1,None,10) is caused by ReshapeSpatialToDense() .

I am unsure as to why the API calls compute_output_shape from root_sum.py for the 5D case, and not the 4D case, but it seems that it needs to be updated to handle the output from ReshapeSpatialToDense().

Would simply changing:

    def compute_output_shape(self, input_shape):
        num_batch, num_nodes_in = input_shape
        if self.return_weighted_child_logits:
            return [num_batch, num_nodes_in]
        else:
            return [num_batch, 1]

to

    def compute_output_shape(self, input_shape):
        _,_,num_batch, num_nodes_in = input_shape
        if self.return_weighted_child_logits:
            return [num_batch, num_nodes_in]
        else:
            return [num_batch, 1]

break anything? Or is there a better way to deal with this (i.e. will this affect other SPN types like RAT-SPN that dont call ReshapeSpatialToDense()).

Thanks again.

Here is the script:

"""
"""
from libspn_keras.layers import ConvProduct, NormalLeaf, ReshapeSpatialToDense, \
    SpatialLocalSum, DenseSum, RootSum
from tensorflow.keras import Sequential
from tensorflow.keras.initializers import TruncatedNormal
from tensorflow.keras.layers import TimeDistributed 

from libspn_keras.dimension_permutation import DimensionPermutation

if __name__ == '__main__':
    from tensorflow.keras import Input, Model
    import numpy as np

    x = y = 28 # 
    c = 1 # channels.
    n_class = 10 # number of classes (number characters for acoustic model).

    sum_kwargs = dict(
        accumulator_initializer=TruncatedNormal(
            stddev=0.5, mean=1.0),
        logspace_accumulators=True
    )

    spn_single_step = Sequential([
        NormalLeaf(
            # input_shape=inp_shape,
            num_components=16, 
            location_trainable=True,
            location_initializer=TruncatedNormal(
                stddev=1.0, mean=0.0),
            # dimension_permutation=DimensionPermutation.BATCH_FIRST
    ),
    # Non-overlapping products
    ConvProduct(
        depthwise=True, 
        strides=[2, 2], 
        dilations=[1, 1], 
        kernel_size=[2, 2],
        padding='valid'
    ),
    SpatialLocalSum(num_sums=16, **sum_kwargs),
    # Non-overlapping products
    ConvProduct(
        depthwise=True, 
        strides=[2, 2], 
        dilations=[1, 1], 
        kernel_size=[2, 2],
        padding='valid'
    ),
    SpatialLocalSum(num_sums=32, **sum_kwargs),
    # Overlapping products, starting at dilations [1, 1]
    ConvProduct(
        depthwise=True, 
        strides=[1, 1], 
        dilations=[1, 1], 
        kernel_size=[2, 2],
        padding='full'
    ),
    SpatialLocalSum(num_sums=32, **sum_kwargs),
    # Overlapping products, with dilations [2, 2] and full padding
    ConvProduct(
        depthwise=True, 
        strides=[1, 1], 
        dilations=[2, 2], 
        kernel_size=[2, 2],
        padding='full'
    ),
    SpatialLocalSum(num_sums=64, **sum_kwargs),
    # Overlapping products, with dilations [4, 4] and full padding
    ConvProduct(
        depthwise=True, 
        strides=[1, 1], 
        dilations=[4, 4], 
        kernel_size=[2, 2],
        padding='full'
    ),
    SpatialLocalSum(num_sums=64, **sum_kwargs),
    # Overlapping products, with dilations [8, 8] and 'final' padding to combine 
    # all scopes
    ConvProduct(
        depthwise=True, 
        strides=[1, 1], 
        dilations=[8, 8], 
        kernel_size=[2, 2],
        padding='final'
    ),
    ReshapeSpatialToDense(),
    # Class roots
    DenseSum(num_sums=n_class, **sum_kwargs),
    RootSum(
        return_weighted_child_logits=True, 
        logspace_accumulators=True, 
        accumulator_initializer=TruncatedNormal(
            stddev=0.0, mean=1.0)
    )
    ])

    # inp_shape: [batch, x, y, channels]
    inp = Input(shape=(x, y, c))
    spn = spn_single_step(inp)
    model = Model(inputs=inp, outputs=spn)
    model.summary()

    # inp_shape: [batch, max_inp_seq_len, x, y, channels]
    inp = Input(shape=(None, x, y, c))
    spn = TimeDistributed(spn_single_step)(inp)
    model = Model(inputs=inp, outputs=spn)
    model.summary()
jostosh commented 4 years ago

Apologies for the late reply. Thanks for pointing this out. I'm surprised that I didn't run into this before. I think your proposed fix makes sense! There's a good chance however that I will revise the API/design of the RootSum, since now it's somewhat broken as you can see here.