This repository contains the code for our ICML 2024 paper on Token-Specific Watermarking with Enhanced Detectability and Semantic Coherence for Large Language Models. You can read the full paper here.
We introduce a novel watermarking method for large language models (LLMs), focusing on two primary objectives:
These metrics are controlled by two hyperparameters: the split ratio ($\gamma$) and watermark logit ($\delta$). These values are adjusted for different tokens to account for their unique characteristics.
To determine token-specific values for $\gamma$ and $\delta$, we use two lightweight networks: the $\gamma$-generator ($G\gamma$) and the $\delta$-generator ($G\delta$). These networks are optimized using a specialized multi-objective optimization framework. Below is an overview of our proposed training method:
Ensure that all packages listed in requirements.txt
are installed in your environment.
For a quick start, refer to demo.ipynb
. This notebook generates watermarked text from a given prompt and computes the z-score, PPL, and SimCSE. Note that this demo is only for OPT models. For llama models, make sure to run watermark.py
as instructed below. Our token-specific gamma/delta values were trained on OPT tokenizers, necessitating an additional conversion process (implemented in watermark.py
) to evaluate on Llama.
To train the network, run the following command:
bash run_pipeline.sh
Select between Multi-Objective Optimization (MOO) or Weighted Sum for training:
z_score_factor=1.0
z_score_factor=4e-4
To modify default settings, check the config
folder. For details on each keyword, refer to config/README.md
.
Results are stored in the eval
folder by default.
Our Method:
CUDA_VISIBLE_DEVICES=0 python watermark.py --config_file config/TS.yaml
model_name_or_path
to the desired model or local location, and also change ckpt_path
to be ckpt/llama/init_0.25_1.75_default.pth
.ckpt
folder, which were trained from different initializations. Use ckpt/opt
to test on OPT models, and ckpt/llama
to test on llama models.KGW:
CUDA_VISIBLE_DEVICES=0 python watermark.py --config_file config/KGW.yaml
If you use this work in your research or applications, please cite it as follows:
@article{huo2024token,
title={Token-Specific Watermarking with Enhanced Detectability and Semantic Coherence for Large Language Models},
author={Huo, Mingjia and Somayajula, Sai Ashish and Liang, Youwei and Zhang, Ruisi and Koushanfar, Farinaz and Xie, Pengtao},
journal={arXiv preprint arXiv:2402.18059},
year={2024}
}