ruizheliUOA / Awesome-Interpretability-in-Large-Language-Models

This repository collects all relevant resources about interpretability in LLMs
Creative Commons Zero v1.0 Universal
170 stars 14 forks source link

Add a paper on stability analysis and efficient Shapley Values computation for LLMs; Add a mini-tutorial on RASP-based Mechanistic Interpretability; #5

Closed yangalan123 closed 2 weeks ago

yangalan123 commented 2 weeks ago

Hi,

Thanks for contributing to this great list by compiling so many resources! I just want to (shamelessly) self-promote some of my own contributions to the interpretability field:

  1. Add a paper on stability analysis on Shapley Values applied on LLMs and efficient Shapley Values computation for LLMs (I personally call it as "Amortized Interpretability" :-) )

Yang, C., Yin, F., He, H., Chang, K. W., Ma, X., & Xiang, B. (2023, July). Efficient Shapley Values Estimation by Amortization for Text Classification. In Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers) (pp. 8666-8680). paper, code, video

  1. Add a mini-tutorial on RASP-based Mechanistic Interpretability, at TTIC & UChicago NLP Seminar slides

Let me know if there are further questions or concerns!

Best, Chenghao

ruizheliUOA commented 2 weeks ago

No problem, and I have added to the list