Open bobox2997 opened 5 months ago
Hi @bobox2997, thanks for opening this PR!
What I would suggest is adding checkpoints, configs and possibly updated modeling files directly on the hub, and have as much support as we can there. It will be easier to integrate than directly into transformers. Here is a tutorial if that sound good to you!
I'm not sure that I understood that correctly... What checkpoint should I add?
On TSDAE implementation the decoder portion is tied to the encoder and is not used at inference, I just need an "is_decoder" argument (and related configs of course) in the config of DeBERTa as there is for BERT, RoBERTa et similia models...
I'm sorry if those are naive or dumb questions, I'm still learning.
Thank you so much for your time!
@bobox2997 Ah, OK, I thought there were checkpoints available trained with this method.
In terms of changes in the transformers library, we're very unlikely to accept changes to the architecture or configurations files to add new features like this, especially for older, popular models and anything which doesn't have official checkpoints available.
The great thing about open source is that you are free to build upon and adapt the available code (license permitting) for your own projects. It should be possible to add as a new architecture on the hub, keeping compatibility with the transformers library and allowing you to use the same API.
If you or anyone else in the community would like to implement this, feel free to share you project here!
Feature request
Seems that there is no config for DeBERTa v1-2-3 as decoder (while there are configs for BERT/RoBERTa et similia models)... This is needed in order to perform TSDAE unsupervised fine tuning.
(TSDAE: Using Transformer-based Sequential Denoising Auto-Encoder for Unsupervised Sentence Embedding Learning)
Motivation
Here a reference for the related sentence transformer issue: https://github.com/UKPLab/sentence-transformers/issues/2771
TSDAE demonstrated to be a powerful unsupervised approach, and DeBERTa is proven to be a really strong base model for further fine tuning (also, v2 has a an xxlarge 1.5B version and v3 demonstrated strong performance and efficiency with its ELECTRA-style pretraining)
For context, here the TSDAE paper: https://arxiv.org/abs/2104.06979
Your contribution
I'm not sure if I can contribute to the repository...
Anyway, i can certainly open source multiple domain adapted models, including models in a size range (1.5B) where there is not much choices while working with encoder-only models
Edited