nlp任务，如何使用？

thuml / LogME

Code release for "LogME: Practical Assessment of Pre-trained Models for Transfer Learning" (ICML 2021) and Ranking and Tuning Pre-trained Models: A New Paradigm for Exploiting Model Hubs (JMLR 2022)

MIT License

203 stars 18 forks source link

nlp任务，如何使用？ #2

Closed kitty-eu-org closed 3 years ago

kitty-eu-org commented 3 years ago

使用论文中对应的hugglingface的预训练模型（未finetune）复现不了论文结果（我使用的是最后一层的CLS，作为特征），在nlp任务中，f具体使用的是预训练模型的什么特征？是bert类模型的哪一层特征？要不要做pooler？

youkaichao commented 3 years ago

抱歉GitHub没开notification，没注意到这个issue。我们用的是cls的输入作为特征，是cls之前的输出。你可以再试试看。

WenWeiTHU commented 3 years ago

使用AutoModelForSequenceClassification.from_pretrained加载预训练模型后可以看到哪些层是重新初始化过的：

选择重新初始化过的最浅层的特征输入（预训练的层的特征输出），具体而言，选择这些层的输入计算LogME：

roberta-base: classifier.dense
distilroberta-base: classifier.dense
distilbert-base-uncased: pre_classifier
distilbert-base-cased: pre_classifier

youkaichao commented 3 years ago

如果还有疑问可以重开issue :)

nxznm commented 2 years ago

您好，想请教下，使用LogME的时候似乎是假设预训练模型是"fixed feature extractors"，但是NLP下游任务中一般finetune的时候也会修改预训练模型自身的权重，这会不会造成使用LogME评测预训练模型优劣的时候有一些偏差呢？

youkaichao commented 2 years ago

会有偏差的。LogME更关注速度，评测快，可以用在大量预训练模型上，快速找出可能较好的模型，在这种场景下，我们认为能够容忍一些偏差，速度更重要。

nxznm commented 2 years ago

好的，感谢回复！