issues
search
panuthept
/
IRIS
Improving Robustness of LLMs on Input Variations by Mitigating Spurious Intermediate States
Apache License 2.0
8
stars
3
forks
source link
issues
Newest
Newest
Most commented
Recently updated
Oldest
Least commented
Least recently updated
[Metrics] Change evaluation metrics from TPR, FPT, F1 to AUC, ROC
#136
panuthept
opened
21 hours ago
0
[Dataset] Create a new dataset for testing exaggerated safety on benign instruction with harmful text
#135
panuthept
opened
22 hours ago
0
Deepinception
#134
KornWtp
closed
5 days ago
1
Add ica jailbreak
#133
KornWtp
closed
5 days ago
0
Add re ne llm method
#132
popochangli
closed
6 days ago
0
130 augmentation add pseudo benign augmentation
#131
panuthept
closed
1 week ago
0
[Augmentation] Add pseudo-benign augmentation
#130
panuthept
closed
1 week ago
0
Add WildChatDataset
#129
panuthept
closed
1 week ago
0
Add SquadDataset
#128
panuthept
closed
1 week ago
0
[Dataset] Add Squad dataset
#127
panuthept
closed
1 week ago
0
Add TyDiQADataset
#126
panuthept
closed
1 week ago
0
[Dataset] Add TyDiQA dataset
#125
panuthept
closed
1 week ago
0
Add BeaverTails30kDataset and BeaverTails330kDataset
#124
panuthept
closed
1 week ago
0
[Dataset] Add BeaverTails dataset
#123
panuthept
closed
1 week ago
0
Add ToxicChatDataset
#122
panuthept
closed
1 week ago
0
[Dataset] Add ToxicChat
#121
panuthept
closed
1 week ago
0
112 finetuning add irisv2
#120
panuthept
closed
1 week ago
0
Add pair method
#119
popochangli
closed
1 week ago
0
Add configs
#118
panuthept
closed
2 weeks ago
0
113 finetuning implement label smoothing for iris finetuning method
#117
panuthept
closed
2 weeks ago
0
[Finetuning] Apply IRIS fine-tuning on ShieldGemma-9B
#116
panuthept
opened
2 weeks ago
0
[Finetuning] Apply IRIS fine-tuning on LLama-Guard-3-8B
#115
panuthept
opened
2 weeks ago
0
[Finetuning] Experiment IRIS fine-tuning with random gold token
#114
panuthept
closed
2 weeks ago
0
[Finetuning] Implement Label Smoothing for IRIS finetuning method
#113
panuthept
closed
2 weeks ago
0
[Finetuning] Add IRISv2
#112
panuthept
closed
1 week ago
0
Add attackers
#111
KornWtp
closed
3 weeks ago
1
Update huggingface_model to support loading from LoRA checkpoint
#110
panuthept
closed
4 weeks ago
0
[Model] Support loading LoRA checkpoint
#109
panuthept
closed
4 weeks ago
0
Add LogitLens
#108
panuthept
closed
1 month ago
0
[Analysis] Implement LogitLens
#107
panuthept
closed
1 month ago
0
[Finetuning] Implement IRIS finetuning method
#106
panuthept
closed
3 weeks ago
0
Update code to train only when on GPU machine
#105
panuthept
closed
1 month ago
0
22 training add peft
#104
panuthept
closed
1 month ago
0
[Investigation] Do existing models bias toward specific n-grams in input prompts as harmful?
#103
panuthept
opened
1 month ago
0
101 cache update cachestorage to support different temperature
#102
panuthept
closed
1 month ago
0
[Cache] Update CacheStorage to support different temperature
#101
panuthept
closed
1 month ago
0
94 model add temperature parameter
#100
panuthept
closed
1 month ago
0
Check empty string
#99
panuthept
closed
1 month ago
0
Check OPENAI_API_KEY for GPTFuzzerJailbreaking
#98
panuthept
closed
1 month ago
0
[Issue] Add an error message for missing OPENAI_API_KEY
#97
panuthept
closed
1 month ago
0
[Issue] Jailbreak must not generate empty string ("")
#96
panuthept
closed
1 month ago
0
91 script add jailbreaking script
#95
panuthept
closed
1 month ago
0
[Model] Add temperature parameter
#94
panuthept
closed
1 month ago
0
[Investigation] Collect confident score of TP, FP, and FN
#93
panuthept
closed
1 month ago
0
[Evaluation] Evaluate ShieldGemma
#92
panuthept
closed
1 month ago
0
[Script] Add jailbreaking script
#91
panuthept
closed
1 month ago
0
Add ShieldGemma
#90
panuthept
closed
1 month ago
0
72 debiasing add counterfactual inference with oracle bias model
#89
panuthept
closed
1 month ago
0
87 model add support for suffix prompt
#88
panuthept
closed
1 month ago
0
[Model] Add support for suffix prompt
#87
panuthept
closed
1 month ago
0
Next