mead-ml / mead-baseline

Deep-Learning Model Exploration and Development for NLP
Apache License 2.0
243 stars 73 forks source link

Feature/ra updates #906

Closed dpressel closed 2 years ago

dpressel commented 2 years ago

Add support for bias-based attention methods including T5 and ALiBi and updates the serialization lib This branch builds on the ALiBi branch from @wenshuoliu, and further abstracts it while adding T5 bucketed RA support. T5 impl was compared against the flaxformer impl and the mesh TF impl. Note that the T5 impl is currently only callable with bidirectional on (and defaults from paper). This should be fixed in a future PR.