DP T5 model - Githubissues

DP Fine-tuning of the T5 model for summarization task:

grad_sampler for hugging face T5LayerNorm in attention heads
Module modification: replacing nn.embedding layer with RelativePositionEmbedding in the encoder/decoder attention
- Used to create embedding for relative position between query and key
- RelativePositionEmbedding is simply a wrapper on nn.embedding layer, created to support two different grad sampler implementation for embedding layer
- Also provided a custom grad_sampler for RelativePositionEmbedding
An example fine-tune script for summarization with T5

microsoft / dp-transformers