Add a paper on stability analysis and efficient Shapley Values computation for LLMs; Add a mini-tutorial on RASP-based Mechanistic Interpretability;

Hi,

Thanks for contributing to this great list by compiling so many resources! I just want to (shamelessly) self-promote some of my own contributions to the interpretability field:

Add a paper on stability analysis on Shapley Values applied on LLMs and efficient Shapley Values computation for LLMs (I personally call it as "Amortized Interpretability" :-) )

Yang, C., Yin, F., He, H., Chang, K. W., Ma, X., & Xiang, B. (2023, July). Efficient Shapley Values Estimation by Amortization for Text Classification. In Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers) (pp. 8666-8680). paper, code, video

Add a mini-tutorial on RASP-based Mechanistic Interpretability, at TTIC & UChicago NLP Seminar slides

Let me know if there are further questions or concerns!

Best, Chenghao

ruizheliUOA / Awesome-Interpretability-in-Large-Language-Models

Add a paper on stability analysis and efficient Shapley Values computation for LLMs; Add a mini-tutorial on RASP-based Mechanistic Interpretability; #5