loss calculation in flan-t5

tanganke / fusion_bench

FusionBench: A Comprehensive Benchmark/Toolkit of Deep Model Fusion

https://tanganke.github.io/fusion_bench/

MIT License

91 stars 9 forks source link

loss calculation in flan-t5 #11

Closed kasurashan closed 2 months ago

kasurashan commented 2 months ago

Hello. I really appreciate you sharing this insightful research paper along with the code I have a question regarding the application of AdaMerging to Flan-T5 (table 13). Could you explain how you computed the loss in this case? Did you simply apply self-entropy to the logits with the shape [batch, seq_len, vocab_size] that come from the model's output? Thank you.

tanganke commented 2 months ago

I calculate the entropy loss of the logits for the first next token. The original AdaMerging paper exclusively experimented with CLIP-ViT models on image classification tasks. For Flan-T5 models, I treat it as a token classification task.

An implementation can be found here, which I will soon integrate into this codebase: https://github.com/tanganke/subspace_fusion/blob/main/scripts/flan_t5_layer_wise_adamerging.py#L81

tanganke commented 2 months ago

This issue seems to be resolved. I am closing it. If you have any other questions, please feel free to reopen it.