turboderp / exllamav2

A fast inference library for running LLMs locally on modern consumer-class GPUs
MIT License
3.22k stars 238 forks source link

Fix case where there are no disallowed tokens in `websocket_actions.py` #264

Closed josephrocca closed 6 months ago

josephrocca commented 6 months ago

I might be misunderstanding something here, but this code in websocket_actions.py:

    if "bann_bann" in request:
        bb = request["bann_bann"]
        if not isinstance(bb, list): bb = [bb]
    else:
        bb = None

Combined with this code that follows:

    gs = ExLlamaV2Sampler.Settings()
    # ...
    gs.disallow_tokens(server.tokenizer, bb)

Means that we may be passing None to disallow_tokens, which looks like this:

        def disallow_tokens(self, tokenizer, tokens):

            if self.token_bias is None:
                padding = -tokenizer.config.vocab_size % 32
                self.token_bias = torch.zeros((tokenizer.config.vocab_size + padding,), dtype = torch.float)

            self.token_bias[tokens] = float("-inf")

And that causes all values in self.token_bias to be set to -inf (I don't know Python/Pytorch well enough to know why that is).

Please close this if there's a more appropriate change/refactor to fix this, assuming it is actually a bug. Of if you'd just rather do it on your machine since it's such a simple change. I guess you might also want to add a check in disallow_tokens to catch future bugs like this.

Thanks!

turboderp commented 6 months ago

Thanks!