Amazon Alexa researchers extract an optimal subset of architectural parameters for the BERT architecture by applying recent breakthroughs in algorithms for neural architecture search. The proposed optimal subset, โBort,โ is just 5.5 percent the effective size of the original BERT-large architecture (not counting the embedding layer), and 16 percent of its net size.
๐ New model addition
Model description
Amazon Alexa researchers extract an optimal subset of architectural parameters for the BERT architecture by applying recent breakthroughs in algorithms for neural architecture search. The proposed optimal subset, โBort,โ is just 5.5 percent the effective size of the original BERT-large architecture (not counting the embedding layer), and 16 percent of its net size.
Open source status
using mxnet and gluonnlp
paper https://arxiv.org/pdf/2010.10499.pdf repo https://github.com/alexa/bort