This repository collects all relevant resources about interpretability in LLMs
Creative Commons Zero v1.0 Universal
170
stars
14
forks
source link
Add a paper on stability analysis and efficient Shapley Values computation for LLMs; Add a mini-tutorial on RASP-based Mechanistic Interpretability; #5
Thanks for contributing to this great list by compiling so many resources! I just want to (shamelessly) self-promote some of my own contributions to the interpretability field:
Add a paper on stability analysis on Shapley Values applied on LLMs and efficient Shapley Values computation for LLMs
(I personally call it as "Amortized Interpretability" :-) )
Yang, C., Yin, F., He, H., Chang, K. W., Ma, X., & Xiang, B. (2023, July). Efficient Shapley Values Estimation by Amortization for Text Classification. In Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers) (pp. 8666-8680).
paper, code, video
Add a mini-tutorial on RASP-based Mechanistic Interpretability, at TTIC & UChicago NLP Seminar slides
Let me know if there are further questions or concerns!
Hi,
Thanks for contributing to this great list by compiling so many resources! I just want to (shamelessly) self-promote some of my own contributions to the interpretability field:
Let me know if there are further questions or concerns!
Best, Chenghao