Closed svboeing closed 5 years ago
The paper attached is slightly outdated and it tested on the old version of GLUE (please refer GLUE website for detailed information). If you look into the tasks in GLUE, pair-wised sentence tasks, e.g, NLI, dominates. Thus, the single sentence task is under training. It requires another step of fine-turning to obtain SOTA on the leaderboard. We select models based on MNLI/RTE. Of course, you can test on all the tasks. You can switch to BERT base, if you don't have powerful GPUs. As mentioned in the README, we will update the paper.
I just wonder if we can obtain or approach SOTA on the leaderboard only using feature-based fine-tuning on the single task, instead of the end-to-end fine-tuning. Note: the feature-based fine-tuning and end-to-end fine-tuning are all based on the MT-DNN.
This is a good question. However, I don't know the answer. At first stage, we tried the feature-based approach and find it didn't help. Thus, we haven't explored this direction, but I still believe that it is worth to give a shot.
Thanks for your reply. By using the feature-based approach, we can significantly reduce the inference cost for multiple tasks.
I just wonder if we can obtain or approach SOTA on the leaderboard only using feature-based fine-tuning on the single task, instead of the end-to-end fine-tuning. Note: the feature-based fine-tuning and end-to-end fine-tuning are all based on the MT-DNN.
Sorry, I am wondering what‘s the definition about feature-based fine-tuning
and end-to-end fine-tuning
, and what's the difference between them?
Conan, you can find the definition in original Bert paper for your reference :) https://arxiv.org/pdf/1810.04805.pdf
On Fri, Apr 12, 2019 at 5:12 PM Conan notifications@github.com wrote:
I just wonder if we can obtain or approach SOTA on the leaderboard only using feature-based fine-tuning on the single task, instead of the end-to-end fine-tuning. Note: the feature-based fine-tuning and end-to-end fine-tuning are all based on the MT-DNN. Sorry, I am wondering what‘s the definition about feature-based fine-tuning and 'end-to-end fine-tuning', and what's the difference between them?
— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/namisan/mt-dnn/issues/7#issuecomment-482500754, or mute the thread https://github.com/notifications/unsubscribe-auth/ABMOQrwIgpaXCmNA4H-_xVOQ15XUPIEkks5vgE3wgaJpZM4b_R1x .
Are the MT-DNN results (shown in the image below) for each GLUE task in the paper based on using a single multi-task model or do you fine-tune on each of the GLUE tasks (as specified in this Git repo) on top of the multi-task model?
In the arxiv paper it is stated:
If I understand it correctly, in your code this multi-task fine-tuning stage is called
MTL refinement
. Then why do you fine-tune for each task in single task setting in yourfine-tuning
stage? There is no such stage in the original paper. Also, inrun_mt_dnn.sh
there are lines:train_datasets="mnli,rte,qqp,qnli,mrpc,sst,cola,stsb" test_datasets="mnli_matched,mnli_mismatched,rte"
Why do you only test onmnli
andrte
and not test on all other tasks? I would also like to ask if I can switch fromBERT large
toBERT base
there because i only have one 1080 GTX card.Thank you.