Closed fengzuo97 closed 4 years ago
Would definitely love to see an implementation of ALBERT added to this repository. Just for completeness:
That said, it could be even more interesting to implement the core improvements (factorized embedding parameterization, cross-layer parameter sharing) from ALBERT in (some?/all?) other transformers as optional features?
Knowing how fast the team works, I would expect ALBERT to be implemented quite soon. That being said, I haven't had time to read the ALBERT paper yet, so it might be more difficult than previous BERT iterations such as distilbert and RoBERTa.
I think ALBERT is very cool! Expect...
And in pytorch (using code from this repo and weights from brightmart) https://github.com/lonePatient/albert_pytorch
Any Update on the progress?
The ALBERT paper will be presented at ICLR in April 2020. From what I last heard, the huggingface team has been talking with the people over at Google AI to share the details of the model, but I can imagine that the researchers rather wait until the paper has been presented. One of those reasons being that they want to get citations from their ICLR talk rather than an arXiv citation which, in the field, is "worth less" than a big conference proceeding.
For now, just be patient. I am sure that the huggingface team will have a big announcement (follow their Twitter/LinkedIn channels) with a new version bump. No need to keep bumping this topic.
The official code and models got released :slightly_smiling_face: https://github.com/google-research/google-research/tree/master/albert
[WIP] ALBERT in tensorflow 2.0 https://github.com/kamalkraj/ALBERT-TF2.0
https://github.com/lonePatient/albert_pytorch
Dataset: MNLI Model: ALBERT_BASE_V2 Dev accuracy : 0.8418
Dataset: SST-2 Model: ALBERT_BASE_V2 Dev accuracy :0.926
PR was created, see here:
[WIP] ALBERT in tensorflow 2.0 https://github.com/kamalkraj/ALBERT-TF2.0
Verison 2 weights added. Support for SQuAD 1.1 and 2.0 added. Reproduces the same results from paper. From my experiments, ALBERT model is very sensitive to hyperparameter like Batch Size. FineTuning using AdamW as Default in Original Repo. AdamW performs better than LAMB on Model finetuning.
This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.
🚀 Feature
Motivation
Additional context