zhouj8553 / FlipDA

MIT License
64 stars 15 forks source link

Using DeBERTa instead of ALBERT #2

Closed chubzchubz97 closed 2 years ago

chubzchubz97 commented 2 years ago

Is it possible to reproduce results with DeBERTa?

zhouj8553 commented 2 years ago

Sure. I only provide the scripts for ALBERT before, considering that this is only a sample of use, and somebody may not be able to run DeBERTa (which is much larger than ALBERT).

The scripts w.r.t. DeBERTa could be achieved easily by slightly modifying the file "scripts/run_pet.sh". But to reduce your workload, I uploaded a file called "scripts/run_deberta_pet.sh". Some hyper-parameters are modified for a better baseline, which has been explained in Appendix A.1 of our paper.

To run with DeBERTa, you could follow the "readme", and substitute all "run_pet.sh" with "run_deberta_pet.sh".

One thing to note is that you should search the hyperparameters in Step3 according to appendix A.5 of our paper. The best config for ALBERT and DeBERTa is not the same. A better implementation or a larger search space is welcome :).

Last but not least, thanks for your attention.

chubzchubz97 commented 2 years ago

Thank you!

And while running the baseline on ReCoRD with ALBERT, I saw this line "Token indices sequence length is longer than the specified maximum sequence length for this model (918 > 512). Running this sequence through the model will result in indexing errors". Does it matter?

zhouj8553 commented 2 years ago

Well, It doesn't matter.