Open bruffridge opened 2 years ago
According to @elkong the weights were empty for the original authors. This probably didn't effect them that much since their dataset was large. For us, I think we need to take a bert or GPL model, expand pretrain data with our dataset and new words so that it understand the relationship between these words. Then! we surgically add it to match.
This discussion I had with the author might be relevant to this:
On Friday, June 11, 2021 at 11:26 AM Yu Zhang wrote:
(1) We've also noticed the SPECTER paper recently, and we are trying to generalize its idea from citation links to different types of metadata. Sorry that we haven't made a direct comparison between MATCH and SPECTER. This will need some modification on the SPECTER code because SPECTER uses metadata for general LM fine-tuning and then performs single-label classification with text only as a downstream task. We will consider doing that comparison later. Thank you for mentioning this!
(2) One problem of applying BERT-based models here is their ability to deal with metadata. Because of the limited vocabulary, the tokenizer of SciBERT (or other BERT-based models) will split author names and reference paper IDs into meaningless subwords in most cases. I feel if one uses a model that can deal with metadata input (e.g., OAG-BERT, https://arxiv.org/abs/2103.02410, https://github.com/thudm/oag-bert), it might be helpful.
Best, Yu
On Fri, Jun 11, 2021 at 8:07 AM Ruffridge, Brandon wrote:
Hello,
Just curious if you’ve compared the performance of MATCH with SPECTER (pdf, github) for multi-label text classification. Also do you think adding SciBERT which has been trained on scientific literature (pdf, github) to MATCH would improve performance?
related to #24
"one of my other major bottlenecks is pretraining weights – I’ve been training MATCH from random weight initializations every time, whereas with models like GPT-2 people just take the pretrained weights and finetune them to get state-of-the-art results. So I’ll look into either finding a way to start from pretrained MATCH weights, or finetuning GPT-2 or some such model." - Eric