Closed tallamjr closed 3 years ago
Useful blog post regarding Positional Encoding: https://kazemnejad.com/blog/transformer_architecture_positional_encoding/
It seems Geron's implementation used as is throws the following error:
File "train.py", line 181, in <module>
training()
File "train.py", line 95, in __call__
num_classes=num_classes,
File "/Users/tallamjr/github/tallamjr/origin/astronet/astronet/t2/model.py", line 26, in __init__
self.pos_encoding = PositionalEncoding(max_steps=self.sequence_length, max_dims=self.embed_dim)
File "/Users/tallamjr/github/tallamjr/origin/astronet/astronet/t2/transformer.py", line 24, in __init__
super(PositionalEncoding).__init__(dtype=dtype, **kwargs)
TypeError: super() takes no keyword arguments
It may be easier to use the other proposed implementation, or, there is anotherone defined in: /Users/tallamjr/github/tallamjr/forks/tfmodels/official/nlp/modeling/layers/position_embedding.py
which defines a class RelativePositionEmbedding(tf.keras.layers.Layer):
as follows:
class RelativePositionEmbedding(tf.keras.layers.Layer):
"""Creates a positional embedding.
This layer calculates the position encoding as a mix of sine and cosine
functions with geometrically increasing wavelengths. Defined and formulized in
"Attention is All You Need", section 3.5.
(https://arxiv.org/abs/1706.03762).
Arguments:
hidden_size: Size of the hidden layer.
min_timescale: Minimum scale that will be applied at each position
max_timescale: Maximum scale that will be applied at each position.
"""
def __init__(self,
hidden_size,
min_timescale=1.0,
max_timescale=1.0e4,
**kwargs):
# We need to have a default dtype of float32, since the inputs (which Keras
# usually uses to infer the dtype) will always be int32.
# We compute the positional encoding in float32 even if the model uses
# float16, as many of the ops used, like log and exp, are numerically
# unstable in float16.
if "dtype" not in kwargs:
kwargs["dtype"] = "float32"
super(RelativePositionEmbedding, self).__init__(**kwargs)
self._hidden_size = hidden_size
self._min_timescale = min_timescale
self._max_timescale = max_timescale
def get_config(self):
config = {
"hidden_size": self._hidden_size,
"min_timescale": self._min_timescale,
"max_timescale": self._max_timescale,
}
base_config = super(RelativePositionEmbedding, self).get_config()
return dict(list(base_config.items()) + list(config.items()))
def call(self, inputs, length=None):
"""Implements call() for the layer.
Args:
inputs: An tensor whose second dimension will be used as `length`. If
`None`, the other `length` argument must be specified.
length: An optional integer specifying the number of positions. If both
`inputs` and `length` are spcified, `length` must be equal to the second
dimension of `inputs`.
Returns:
A tensor in shape of [length, hidden_size].
"""
if inputs is None and length is None:
raise ValueError("If inputs is None, `length` must be set in "
"RelativePositionEmbedding().")
if inputs is not None:
input_shape = tf_utils.get_shape_list(inputs)
if length is not None and length != input_shape[1]:
raise ValueError(
"If inputs is not None, `length` must equal to input_shape[1].")
length = input_shape[1]
position = tf.cast(tf.range(length), tf.float32)
num_timescales = self._hidden_size // 2
min_timescale, max_timescale = self._min_timescale, self._max_timescale
log_timescale_increment = (
math.log(float(max_timescale) / float(min_timescale)) /
(tf.cast(num_timescales, tf.float32) - 1))
inv_timescales = min_timescale * tf.exp(
tf.cast(tf.range(num_timescales), tf.float32) *
-log_timescale_increment)
scaled_time = tf.expand_dims(position, 1) * tf.expand_dims(
inv_timescales, 0)
position_embeddings = tf.concat(
[tf.sin(scaled_time), tf.cos(scaled_time)], axis=1)
return position_embeddings
Where hidden_size
seems to simply be the dimension of the model
It seems Geron's implementation used as is throws the following error:
It turns out, this was happening because of a mistake on my end, i.e
diff --git a/astronet/t2/transformer.py b/astronet/t2/transformer.py
index d5be210..2d126e8 100644
--- a/astronet/t2/transformer.py
+++ b/astronet/t2/transformer.py
@@ -22,18 +22,18 @@ class ConvEmbedding(layers.Layer):
class PositionalEncoding(keras.layers.Layer):
def __init__(self, max_steps, max_dims, dtype=tf.float32, **kwargs):
- super(PositionalEncoding).__init__(dtype=dtype, **kwargs)
+ super(PositionalEncoding, self).__init__(dtype=dtype, **kwargs)
This was wrecking the __mro__
that was expected.
Refs:
As discussed in #35 , it seems a
PositionalEncoding
class is required to restore temporal information to the input sequence.From Hands On ML book:
Examples can be found at https://www.tensorflow.org/tutorials/text/transformer which uses functions akin to:
Which is then used later in an
EncodingLayer
like so:OR, in Hands-On ML book (pg 558), a
PositionalEncoding
class is defined like:To be used elsewhere as:
With the diff of
model.py
perhaps something like: