mignonjia / TS_watermark

12 stars 0 forks source link

Token-Specific Watermarking with Enhanced Detectability and Semantic Coherence for Large Language Models

This repository contains the code for our ICML 2024 paper on Token-Specific Watermarking with Enhanced Detectability and Semantic Coherence for Large Language Models. You can read the full paper here.

Introduction

We introduce a novel watermarking method for large language models (LLMs), focusing on two primary objectives:

These metrics are controlled by two hyperparameters: the split ratio ($\gamma$) and watermark logit ($\delta$). These values are adjusted for different tokens to account for their unique characteristics.

To determine token-specific values for $\gamma$ and $\delta$, we use two lightweight networks: the $\gamma$-generator ($G\gamma$) and the $\delta$-generator ($G\delta$). These networks are optimized using a specialized multi-objective optimization framework. Below is an overview of our proposed training method:

overview

Environment Setup

Ensure that all packages listed in requirements.txt are installed in your environment.

Demo

For a quick start, refer to demo.ipynb. This notebook generates watermarked text from a given prompt and computes the z-score, PPL, and SimCSE. Note that this demo is only for OPT models. For llama models, make sure to run watermark.py as instructed below. Our token-specific gamma/delta values were trained on OPT tokenizers, necessitating an additional conversion process (implemented in watermark.py) to evaluate on Llama.

Training

To train the network, run the following command:

bash run_pipeline.sh

Select between Multi-Objective Optimization (MOO) or Weighted Sum for training:

Evaluation

Default Settings

To modify default settings, check the config folder. For details on each keyword, refer to config/README.md.

Running Evaluation

Results are stored in the eval folder by default.

Citation

If you use this work in your research or applications, please cite it as follows:

@article{huo2024token,
  title={Token-Specific Watermarking with Enhanced Detectability and Semantic Coherence for Large Language Models},
  author={Huo, Mingjia and Somayajula, Sai Ashish and Liang, Youwei and Zhang, Ruisi and Koushanfar, Farinaz and Xie, Pengtao},
  journal={arXiv preprint arXiv:2402.18059},
  year={2024}
}