ccks gpt2-xl 模型为什么用roberta模型加载？

zjunlp / EasyEdit

[ACL 2024] An Easy-to-use Knowledge Editing Framework for LLMs.

https://zjunlp.github.io/project/KnowEdit

MIT License

1.82k stars 217 forks source link

ccks gpt2-xl 模型为什么用roberta模型加载？ #327

Closed XuJianzhi closed 2 months ago

XuJianzhi commented 3 months ago

https://github.com/zjunlp/EasyEdit/blob/main/examples/run_ccks_SafeEdit_gpt2-xl.py#L189 为什么用roberta模型加载训练后的gpt2-xl模型文件呢？

mengrusun commented 2 months ago

gpt2-xl is a vanilla LLM, which may generate toxic responses to adversarial inputs. Therefore, we aim to detoxify the vanilla LLM (gpt2-xl). Besides, we should design a classifier (roberta) judge whether a response is toxic. Note that we load the weights of the classifier (RoBERTa) instead of GPT-2 XL in line 189. You can check that the checkpoint path in line 189 is for the classifier (RoBERTa) rather than GPT-2 XL.

zxlzr commented 2 months ago

Do you have any further questions?

zjunlp / EasyEdit

ccks gpt2-xl 模型 为什么用roberta模型加载？ #327

ccks gpt2-xl 模型为什么用roberta模型加载？ #327