Improving Robustness of LLMs on Input Variations by Mitigating Spurious Intermediate States
8
stars
3
forks
source link
[Dataset] Create a new dataset for testing exaggerated safety on benign instruction with harmful text #135
Open
panuthept opened 1 day ago