thuml / LogME

Code release for "LogME: Practical Assessment of Pre-trained Models for Transfer Learning" (ICML 2021) and Ranking and Tuning Pre-trained Models: A New Paradigm for Exploiting Model Hubs (JMLR 2022)
MIT License
203 stars 18 forks source link

A question about the meaning of the LogME score #11

Closed zxC0der closed 2 years ago

zxC0der commented 2 years ago

Hello, thanks for your this work. I have a question about the meaning of the score after I read the paper.

For example, I have a pre-trained model M, and a labeled target dataset {x,y_truth}. And I pass x to the model M to get the features X and y_pred.

I noticed that the logme(X,y_pred) > logme(X,y_truth).

I am confused that if the meaning of LogME score is the benefits of fine tuning the model using the target dataset, the logme(X,y_truth) should be greater than logme(X,y_pred) because it's ground_truth. Because I think the result using a "True" dataset to fine-tune must be better than using a "Noisy" dataset.

youkaichao commented 2 years ago

This is expected: logme(X,y_pred) > logme(X,y_truth).

LogME measures how well the best linear classifier can fit the label, and since y_pred is a linear combination of features, it has a larger LogME score than y_truth.

In our paper, we have a dataset with {x, y_truth}, and we want to assess multiple pre-trained models M1, M2, ... . If M1 has a larger LogME score than M2, then it means a linear classifier can fit {M1(x), y_truth} better than {M2(x), y_truth}. Therefore we think M1 is better for this task, as its features are better than M2.

Typically, LogME evaluates various pre-trained models, rather than the benefits of fine tuning a model using the dataset. If you fix a model and select a dataset, LogME will help you select the easiest dataset it can fit.

zxC0der @.***> 于2022年6月16日周四 10:07写道:

Hello, thanks for your this work. I have a question about the meaning of the score after I read the paper.

For example, I have a pre-trained model M, and a labeled target dataset {x,y_truth}. And I pass x to the model M to get the features X and y_pred.

I noticed that the logme(X,y_pred) > logme(X,y_truth).

I am confused that if the meaning of LogME score is the benefits of fine tuning the model using the target dataset, the logme(X,y_truth) should be greater than logme(X,y_pred) because it's ground_truth. Because I think the result using a "True" dataset to fine-tune must be better than using a "Noisy" dataset.

— Reply to this email directly, view it on GitHub https://github.com/thuml/LogME/issues/11, or unsubscribe https://github.com/notifications/unsubscribe-auth/AFRJAHR2ODHHSDWGZOHGGVDVPKD6DANCNFSM5Y5G7OFA . You are receiving this because you are subscribed to this thread.Message ID: @.***>

-- All my best wishes!

KaiChao YOU School of Software Tsinghua University Beijing 100084 People's Republic of China E-mail: @. / @. Homepage: https://youkaichao.github.io

zxC0der commented 2 years ago

Oh, I got it. Thanks again

nxznm commented 2 years ago

@youkaichao Hi~ After reading the paper and applying this algorithm into my own project to check which pretrained LM (PLM) is better for a specific downstream task, I also have a question on the LogME score. Can the gap of two PLM's LogME socres reflect their performance gap on the downstream task? For example, if the LogME scores of PLM1, PLM2 and PLM3 are -29, -26 and -14, respectively, would the improvement between PLM2 and PLM3 be generally greater than PLM1 and PLM2 on the downstream task? Besides, if the LogME scores of PLM1, PLM2, PLM3, PLM4 are -29, -26, 13 and 16, would the improvement between PLM2 and PLM1 be approximately equal with that between PLM4 and PLM3?

youkaichao commented 2 years ago

Hi, the LogME score changes in a non-linear way. You can have a look at Figure 3 in the paper: the lowest LogME score saturates around a specific value and do not behave linearly.

The behavior you expect is that performance scales linearly with the transferability metric, which is hard to realize, since people can have multiple performance metrics. If the transferability metric scales linearly with accuracy, then it does not scale linearly with cross-entropy loss.

[image: image.png]

nxznm @.***> 于2022年7月21日周四 16:32写道:

@youkaichao https://github.com/youkaichao Hi~ After reading the paper and applying this algorithm into my own project to check which pretrained LM (PLM) is better for a specific downstream task, I also have a question on the LogME score. Can the gap of two PLM's LogME socres reflect their performance gap on the downstream task? For example, if the LogME scores of PLM1, PLM2 and PLM3 are -29, -26 and -14, respectively, would the improvement between PLM2 and PLM3 be generally greater than PLM1 and PLM2 on the downstream task? Besides, if the LogME scores of PLM1, PLM2, PLM3, PLM4 are -29, -26, 13 and 16, would the improvement between PLM2 and PLM1 be approximately equal with that between PLM4 and PLM3?

— Reply to this email directly, view it on GitHub https://github.com/thuml/LogME/issues/11#issuecomment-1191200345, or unsubscribe https://github.com/notifications/unsubscribe-auth/AFRJAHTBLQPDQCH5AYLLUQTVVEDIDANCNFSM5Y5G7OFA . You are receiving this because you were mentioned.Message ID: @.***>

-- All my best wishes!

KaiChao YOU School of Software Tsinghua University Beijing 100084 People's Republic of China E-mail: @. / @. Homepage: https://youkaichao.github.io

nxznm commented 2 years ago

Got it. Thanks for your patience.

zxC0der commented 2 years ago

This is expected: logme(X,y_pred) > logme(X,y_truth). LogME measures how well the best linear classifier can fit the label, and since y_pred is a linear combination of features, it has a larger LogME score than y_truth. In our paper, we have a dataset with {x, y_truth}, and we want to assess multiple pre-trained models M1, M2, ... . If M1 has a larger LogME score than M2, then it means a linear classifier can fit {M1(x), y_truth} better than {M2(x), y_truth}. Therefore we think M1 is better for this task, as its features are better than M2. Typically, LogME evaluates various pre-trained models, rather than the benefits of fine tuning a model using the dataset. If you fix a model and select a dataset, LogME will help you select the easiest dataset it can fit. zxC0der @.> 于2022年6月16日周四 10:07写道: Hello, thanks for your this work. I have a question about the meaning of the score after I read the paper. For example, I have a pre-trained model M, and a labeled target dataset {x,y_truth}. And I pass x to the model M to get the features X and y_pred. I noticed that the logme(X,y_pred) > logme(X,y_truth). I am confused that if the meaning of LogME score is the benefits of fine tuning the model using the target dataset, the logme(X,y_truth) should be greater than logme(X,y_pred) because it's ground_truth. Because I think the result using a "True" dataset to fine-tune must be better than using a "Noisy" dataset. — Reply to this email directly, view it on GitHub <#11>, or unsubscribe https://github.com/notifications/unsubscribe-auth/AFRJAHR2ODHHSDWGZOHGGVDVPKD6DANCNFSM5Y5G7OFA . You are receiving this because you are subscribed to this thread.Message ID: @.> -- All my best wishes! KaiChao YOU School of Software Tsinghua University Beijing 100084 People's Republic of China E-mail: @. / @. Homepage: https://youkaichao.github.io


Hi, sorry to bother again on this issue, as you said, “LogME measures how well the best linear classifier can fit the label”, is “the best linear classifier” the self.ms in LogME.py ?

youkaichao commented 2 years ago

@zxC0der yes you are right. self.ms can be seen as the weight of the linear classifier.