issues
search
ruizheliUOA
/
Awesome-Interpretability-in-Large-Language-Models
This repository collects all relevant resources about interpretability in LLMs
Creative Commons Zero v1.0 Universal
170
stars
14
forks
source link
issues
Newest
Newest
Most commented
Recently updated
Oldest
Least commented
Least recently updated
add nnsight paper
#10
loftusa
closed
3 hours ago
0
Add "Internal Consistency and Self-Feedback in Large Language Models: A Survey"
#9
fan2goa1
closed
2 days ago
0
add awesome-attention-head
#8
Ki-Seki
closed
2 weeks ago
0
Update README.md
#7
LetiP
closed
2 weeks ago
0
Suggestion for Inclusion in Research List: Interpretability of Mamba Using LRP
#6
FarnoushRJ
closed
2 weeks ago
1
Add a paper on stability analysis and efficient Shapley Values computation for LLMs; Add a mini-tutorial on RASP-based Mechanistic Interpretability;
#5
yangalan123
closed
2 weeks ago
1
Add related paper: Preference Tuning For Toxicity Mitigation Generalizes Across Languages
#4
SeuperHakkerJa
closed
2 weeks ago
0
Update README.md
#3
LetiP
closed
2 weeks ago
0
added eleutherai sae
#2
Butanium
closed
2 weeks ago
0
Add Inseq to libraries
#1
aryopg
closed
2 weeks ago
0