likenneth / honest_llama

Inference-Time Intervention: Eliciting Truthful Answers from a Language Model
MIT License
426 stars 32 forks source link