Open vorwerkc opened 2 months ago
Hi @vorwerkc 👋
Thanks for reporting this. Could provide a bit more guidance in how you detect the watermarking? I'm not super familiar with the lm-watermarking
code. For example what would be the best way for me to repro this locally? Thank's in advance 🙌
The easiest (but not the most concise) working example, is to define the following 2 classes:
class WatermarkBase:
def __init__(
self,
vocab: list[int] = None,
gamma: float = 0.5,
delta: float = 2.0,
hash_key: int = 15485863, # just a large prime number to create a rng seed with sufficient bit width
select_green_tokens: bool = True,
):
# watermarking parameters
self.vocab = vocab
self.vocab_size = len(vocab)
self.gamma = gamma
self.delta = delta
self.rng = None
self.hash_key = hash_key
self.select_green_tokens = select_green_tokens
def _seed_rng(self, input_ids: torch.LongTensor) -> None:
assert input_ids.shape[-1] >= 1, f"seeding requires at least a 1 token prefix sequence to seed rng"
prev_token = input_ids[-1].item()
self.rng.manual_seed(self.hash_key * prev_token)
return
def _get_greenlist_ids(self, input_ids: torch.LongTensor) -> list[int]:
# seed the rng using the previous tokens/prefix
# according to the seeding_scheme
self._seed_rng(input_ids)
greenlist_size = int(self.vocab_size * self.gamma)
vocab_permutation = torch.randperm(self.vocab_size, device=input_ids.device, generator=self.rng)
if self.select_green_tokens: # directly
greenlist_ids = vocab_permutation[:greenlist_size] # new
else: # select green via red
greenlist_ids = vocab_permutation[(self.vocab_size - greenlist_size) :] # legacy behavior
return greenlist_ids```
class WatermarkDetector(WatermarkBase):
def __init__(
self,
*args,
device: torch.device = None,
tokenizer: Tokenizer = None,
normalizers: list[str] = ["unicode"], # or also: ["unicode", "homoglyphs", "truecase"]
**kwargs,
):
super().__init__(*args, **kwargs)
# also configure the metrics returned/preprocessing options
assert device, "Must pass device"
assert tokenizer, "Need an instance of the generating tokenizer to perform detection"
self.tokenizer = tokenizer
self.device = device
self.z_threshold = z_threshold
self.rng = torch.Generator(device=self.device)
self.min_prefix_len = 1
self.normalizers = []
def _compute_z_score(self, observed_count, T):
# count refers to number of green tokens, T is total number of tokens
expected_count = self.gamma
numer = observed_count - expected_count * T
denom = sqrt(T * expected_count * (1 - expected_count))
z = numer / denom
return z
def _compute_p_value(self, z):
p_value = scipy.stats.norm.sf(z)
return p_value
def score(self, text: str):
tokenized_text = self.tokenizer(text, return_tensors="pt", add_special_tokens=False)["input_ids"][0].to(self.device)
return self._score_sequence(tokenized_text)
def _score_sequence(
self,
input_ids: Tensor,
):
num_tokens_scored = len(input_ids) - self.min_prefix_len
if num_tokens_scored < 1:
raise ValueError(
(
f"Must have at least {1} token to score after "
f"the first min_prefix_len={self.min_prefix_len} tokens required by the seeding scheme."
)
)
# Standard method.
# Since we generally need at least 1 token (for the simplest scheme)
# we start the iteration over the token sequence with a minimum
# num tokens as the first prefix for the seeding scheme,
# and at each step, compute the greenlist induced by the
# current prefix and check if the current token falls in the greenlist.
green_token_count, green_token_mask = 0, []
for idx in range(self.min_prefix_len, len(input_ids)):
curr_token = input_ids[idx]
greenlist_ids = self._get_greenlist_ids(input_ids[:idx])
if curr_token in greenlist_ids:
green_token_count += 1
green_token_mask.append(True)
else:
green_token_mask.append(False)
z_score=self._compute_z_score(green_token_count, num_tokens_scored)
p_value = self._compute_p_value(z_score)
return z_score, p_value
Once you have access to the tokenizer of the model in TGI, you can define a WatermarkDetector
as
detector = WatermarkDetector(
vocab=list(tokenizer.get_vocab().values()),
gamma=0.25,
device=device,
tokenizer=tokenizer,
z_threshold=4.0,
normalizers=[],
select_green_tokens=True
)
The gamma
and beta
parameter have to be identical to the ones defined in the TGI instance. You can then detect any LLM-generated string with
detector.score(string)
The Z-score should be large (ideally over 4) and the p-value should vanish.
I looked into the original code (https://github.com/jwkirchenbauer/lm-watermarking) and found that the detection code has to run on the same device
as the watermarking code, typically CUDA. Otherwise the random-number generators will be inconsistent. Even when running on CUDA, the detection seems to be inconsistent.
@vorwerkc thank's a lot for providing an example 🙌
We're quite low on bandwidth at the moment so can't address this immediately. But I'll keep this issue open 👍 Also if you have the time + passion, please feel free to suggest changes or make a PR!
System Info
text-generation-inference version 2.2.0 model "mistralai/Mixtral-8x7B-Instruct-v0.1"
Information
Tasks
Reproduction
generate
orgenerate_stream
withwatermark=True
Expected behavior
From the documentation, it seems that watermarking is activated server-side for all models, and can be activated for each API-call with
watermark=True
.I have not been able to detect the watermarking in any output created by "mistralai/Mixtral-8x7B-Instruct-v0.1"