thushv89 / attention_keras

Keras Layer implementation of Attention for Sequential models
https://towardsdatascience.com/light-on-math-ml-attention-with-keras-dc8dbc1fad39
MIT License
443 stars 266 forks source link

Support for using None in sequence length #36

Closed moazshorbagy closed 4 years ago

moazshorbagy commented 4 years ago

The problem Tensorflow has an awesome feature where one can use None as sequence length, this allows using variable sequence lengths across different batches. The code of the AttentionLayer gives an error when trying to use None as sequence length.

The error trace

Desktop\attention_keras\examples\nmt_bidirectional\layers\attention.py:98 create_inital_state  *
        fake_state = K.tile(fake_state, [1, hidden_size])  # <= (batch_size, latent_dim
    AppData\Local\Temp\tmptt0gyc3b.py:153 create_inital_state
        fake_state = ag__.converted_call(K.tile, create_inital_state_scope.callopts, (fake_state, [1, hidden_size]), None, create_inital_state_scope)
    anaconda3\lib\site-packages\tensorflow_core\python\keras\backend.py:3014 tile
        return array_ops.tile(x, n)
    anaconda3\lib\site-packages\tensorflow_core\python\ops\gen_array_ops.py:11310 tile
        "Tile", input=input, multiples=multiples, name=name)
    anaconda3\lib\site-packages\tensorflow_core\python\framework\op_def_library.py:530 _apply_op_helper
        raise err
    anaconda3\lib\site-packages\tensorflow_core\python\framework\op_def_library.py:527 _apply_op_helper
        preferred_dtype=default_dtype)
    anaconda3\lib\site-packages\tensorflow_core\python\framework\ops.py:1296 internal_convert_to_tensor
        ret = conversion_func(value, dtype=dtype, name=name, as_ref=as_ref)
    anaconda3\lib\site-packages\tensorflow_core\python\framework\constant_op.py:286 _constant_tensor_conversion_function
        return constant(v, dtype=dtype, name=name)
    anaconda3\lib\site-packages\tensorflow_core\python\framework\constant_op.py:227 constant
        allow_broadcast=True)
    anaconda3\lib\site-packages\tensorflow_core\python\framework\constant_op.py:265 _constant_impl
        allow_broadcast=allow_broadcast))
    anaconda3\lib\site-packages\tensorflow_core\python\framework\tensor_util.py:545 make_tensor_proto
        "supported type." % (type(values), values))

    TypeError: Failed to convert object of type <class 'list'> to Tensor. Contents: [1, None]. Consider casting elements to a supported type.
thushv89 commented 4 years ago

Hi,

Though it is allowed in TensorFlow implementation, because of the way the computations are designed in this layer, it is not possible. In short the issue is that, when there's two None dimensions (batch and time dimensions), I cannot use any reshaping within the attention layer as it is currently using.

I'll need to do some research on this topic, there might be a way to achieve this, but with degraded performance.

thushv89 commented 4 years ago

@moazshorbagy This support is available now.

moazshorbagy commented 4 years ago

@thushv89 Thank you, that's awesome.