sacmehta / delight

DeLighT: Very Deep and Light-Weight Transformers
MIT License
467 stars 53 forks source link

Several heads perplexity #6

Closed DaniyarM closed 3 years ago

DaniyarM commented 3 years ago

Have you checked the results of your model using multiple (2,3) heads? Does this lead to better results or does it not make sense with yours model?

sacmehta commented 3 years ago

We tried our model with multiple heads and performance do not improve.

DaniyarM commented 3 years ago

Thank you!