How to change the decoder to any transformer architecture ?

I f you want to use a pre trained Transformer for the same task, how would you use it instead of LSTM here? For example I want to use a lightweight BERT model, what would ne the changes to the line in the end? Trying to grasp the knowledge of the architecture.

 squeezed = layers.Reshape((x7.shape[-3] * x7.shape[-2], x7.shape[-1]))(x7)

    blstm = layers.Bidirectional(layers.LSTM(64, return_sequences=True))(squeezed)

    output = layers.Dense(output_dim + 1, activation='softmax', name="output")(blstm)

    model = Model(inputs=inputs, outputs=output)

pythonlessons / mltu

How to change the decoder to any transformer architecture ? #4