Open r-wheeler opened 4 years ago
You could reimplement the QKV / dense logic in terms of einsum for faster computation. An example layer here and the use here. This is how it is is now implemented in the tf2 version of bert / transformer.
You could reimplement the QKV / dense logic in terms of einsum for faster computation. An example layer here and the use here. This is how it is is now implemented in the tf2 version of bert / transformer.