inseq-team / inseq

Interpretability for sequence generation models 🐛 🔍
https://inseq.org
Apache License 2.0
372 stars 36 forks source link

On the interoperability with ferret #218

Closed romainf28 closed 1 year ago

romainf28 commented 1 year ago

Question

Thanks for providing a very useful library for applying feature attributions on seq2seq models. I was wondering how you planned on integrating ferret evaluation metrics in Inseq. For instance, will you compute AOPC comprehensiveness for every generated token and then take the average of all the scores ? Or are you planning on designing completely new metrics for the evaluation of feature attributions on seq2seq models ?

gsarti commented 1 year ago

Hello @romainf28, thank you for your question!

We do plan to add interoperability between inseq and ferret soon, and I'll have forthcoming work addressing plausibility evaluation for sequence generation models.

In general, I think faithfulness evaluation can be naturally extended to sequence generation models in the way you mention, although I recommend using metrics accounting for the magnitude of attribution scores such as Soft-Comprehensiveness and Sufficiency (Zhao and Aletras, 2023). On the contrary, I think that the evaluation of plausibility should focus specifically on phenomena (i.e. specific tokens in the generation) for whose there is a human-understandable cue in the preceding context. AUPRC and MRR would be good choices in this latter context.

Hope this helps!

gsarti commented 1 year ago

@romainf28 you might also want to check out our Discord server! Join link: https://discord.gg/V5VgwwFPbu. ferret authors are also there, so it would be a great place to discuss such matters! :slightly_smiling_face:

Let me know if this answers your question, so that I can proceed to close the issue!

romainf28 commented 1 year ago

Thank you for your help @gsarti ! You can close the issue. I will join the discord server !