I noticed that the provided implementation uses mean pooling. I am particularly interested in the version of the model where the gated attention mechanism is properly integrated. Is there an existing version or could you provide some guidance regarding its implementation ? Any help would be greatly appreciated.
I noticed that the provided implementation uses mean pooling. I am particularly interested in the version of the model where the gated attention mechanism is properly integrated. Is there an existing version or could you provide some guidance regarding its implementation ? Any help would be greatly appreciated.