[Dataset] Create a new dataset for testing exaggerated safety on benign instruction with harmful text

panuthept / IRIS

Improving Robustness of LLMs on Input Variations by Mitigating Spurious Intermediate States

Apache License 2.0

8 stars 3 forks source link

Open panuthept opened 1 day ago