tallamjr / astronet

Efficient Deep Learning for Real-time Classification of Astronomical Transients and Multivariate Time-series
Apache License 2.0
14 stars 3 forks source link

Revisit implementation of Multi-head attention #38

Open tallamjr opened 3 years ago

tallamjr commented 3 years ago

Currently there are two implementations of Multi-head attention. The one in use at the moment can be found in astronet.t2.attention.py with the other found in astronet.t2.multihead_attention.py

The current one does not use mask-ing whilst the other does, it is not certain if this is something that is required for the set up we are going for with Supernova.

There exists unit tests for the astronet.t2.multihead_attention.py implementation, but not for the one that is currently in use, this should be addressed. The astronet.t2.multihead_attention.py also returns outputs and attention_weights

Next steps would be to implement tests for the one in use, and compare implementations to decide which is best.

Refs:

tallamjr commented 3 years ago

As of 2021.03.17 in tensorflow-v2.4.0 now has an implementation of MultiHeadAttention which may be preferable to use instead