Closed XuJianzhi closed 2 months ago
gpt2-xl is a vanilla LLM, which may generate toxic responses to adversarial inputs. Therefore, we aim to detoxify the vanilla LLM (gpt2-xl). Besides, we should design a classifier (roberta) judge whether a response is toxic. Note that we load the weights of the classifier (RoBERTa) instead of GPT-2 XL in line 189. You can check that the checkpoint path in line 189 is for the classifier (RoBERTa) rather than GPT-2 XL.
Do you have any further questions?
https://github.com/zjunlp/EasyEdit/blob/main/examples/run_ccks_SafeEdit_gpt2-xl.py#L189 为什么用roberta模型加载训练后的gpt2-xl模型文件呢?