hannamw / eap-ig-faithfulness

Code for "Automatic Circuit Finding and Faithfulness"
5 stars 1 forks source link

eap-ig-faithfulness

This repo contains the code for the paper "Automatic Circuit Finding and Faithfulness" (COLM 2024; Arxiv link). The code for manipulating model graphs and performing EAP/-IG is contained in a submodule, EAP-IG; make sure to pull the commit / submodule here, as I've updated the EAP-IG module since releasing this experimental code. After pulling that submodule, you can install it by calling pip install -e . in its directory. A Conda environment that can be used to run these experiments is contained in environment.yml.

You can replicate the experiments in the paper as follows:

The data used in this paper is available in the data folder.