peterljq / Parsimonious-Concept-Engineering

Parsimonious Concept Engineering (PaCE) uses sparse coding on a large-scale concept dictionary to effectively improve the trustworthiness of Large Language Models by precisely controlling and modifying their neural activations.
25 stars 1 forks source link