Open k-girish opened 2 years ago
DP Fine-tuning of the T5 model for summarization task:
grad_sampler
nn.embedding
RelativePositionEmbedding
DP Fine-tuning of the T5 model for summarization task:
grad_sampler
for hugging face T5LayerNorm in attention headsnn.embedding
layer withRelativePositionEmbedding
in the encoder/decoder attentionRelativePositionEmbedding
is simply a wrapper onnn.embedding
layer, created to support two different grad sampler implementation for embedding layergrad_sampler
forRelativePositionEmbedding