nlpodyssey / cybertron

Cybertron: the home planet of the Transformers in Go
BSD 2-Clause "Simplified" License
289 stars 26 forks source link

Adding support for Distilbert #21

Open codetreras opened 1 year ago

codetreras commented 1 year ago

Based on the Bert's code for language modeling and text encoding tasks, these changes add support for DistilBert architecture #7 .

matteo-grella commented 1 year ago

Thank you! What differs DistilBERT from BERT?

codetreras commented 1 year ago

You're welcome, the project is awesome. The main differences are the configuration and the layers' identifiers. Architecturally, DistilBert has no token type embeddings or pooler. Check this image, in blue the equivalent layers, in orange the dissimilar ones.

Screenshot 2023-06-27 at 11 11 03 AM

At the beginning I thought about including DistilBert as a "variation" of Bert, however it would increase considerably the complexity of the code, here redundancy is necessary to make maintenance easier, let me know your thoughts.

matteo-grella commented 1 year ago

@marco-nicola what do you think friend? I’ll go for it but a bit worried about code duplication for just a few differences.

mooijtech commented 1 year ago

Preferably just use the DistilBERT config (extend code in BERT) so there's no need for duplicate code.

codetreras commented 1 year ago

Got it, in that case extending the converter/preprocessing.go and converter/mapper.go for BERT would be the proper way to manage the differences in layer identifiers, together with the configuration. Let me know what you think, I can modify the PR for you to check this approach.

mooijtech commented 1 year ago

I'm looking into supporting flan-t5-* but so far I'm stuck since there are differences in the positional encoder (different weight key) so it currently fails when prompting due to some input being nil (it seems the second time round).

matteo-grella commented 1 year ago

@mooijtech I am in vacation with family so it is a bit difficult for me to follow up on this now. I'll back to you next week and we'll figure it out together how to proceed with flan-t5-*!