Currently there are two implementations of Multi-head attention. The one in use at the moment can be found in astronet.t2.attention.py with the other found in astronet.t2.multihead_attention.py
The current one does not use mask-ing whilst the other does, it is not certain if this is something that is required for the set up we are going for with Supernova.
There exists unit tests for the astronet.t2.multihead_attention.py implementation, but not for the one that is currently in use, this should be addressed. The astronet.t2.multihead_attention.py also returns outputsandattention_weights
Next steps would be to implement tests for the one in use, and compare implementations to decide which is best.
Currently there are two implementations of Multi-head attention. The one in use at the moment can be found in
astronet.t2.attention.py
with the other found inastronet.t2.multihead_attention.py
The current one does not use
mask
-ing whilst the other does, it is not certain if this is something that is required for the set up we are going for with Supernova.There exists unit tests for the
astronet.t2.multihead_attention.py
implementation, but not for the one that is currently in use, this should be addressed. Theastronet.t2.multihead_attention.py
also returnsoutputs
andattention_weights
Next steps would be to implement tests for the one in use, and compare implementations to decide which is best.
Refs:
astronet.t2.multihead_attention.py
: https://medium.com/@burnmg/software-testing-in-tensorflow-2-0-33c440ca908castronet.t2.attention.py
: https://keras.io/examples/nlp/text_classification_with_transformer/