Open calebmor460 opened 1 year ago
I'm not taking it as complaints, don't get me wrong. But I also don't want to doubt people when they say the output is bad, like in this case there was a bug causing the forward pass to run slightly incorrectly, and it's good that I found that. I also don't blame people for not having any other way to determine if the output is off than by comparing it to other implementations, because I don't either. It's a black box after all. Just sometimes I do wish it came more in the form of: "here's the output I got, here's (exactly) how I got it, and here's why I think it's wrong."
But yeah, it's not to be confrontational or anything, I just wish there was a better way of communicating that I don't view Transformers as the standard. Maybe I just need an FAQ. :)
As for the tokenizer, I do plan to add some support special tokens. For models that rely on them it gets messy otherwise, if they have to be inserted after encoding and disappear when decoding.
So... how do things stand with the special tokens?
Having studied it a bit, it seems the authors of SentencePiece go out of their way to explain that control symbols are categorically invalid inputs to the encoder. Meanwhile, Transformers implements a very elaborate workaround so they can be encoded anyway. I imagine those two teams don't like each other very much.
Anyway, it wouldn't be too difficult to emulate what Transformers does, but it would be kind of messy so I'm wondering how many models actually include control symbols in their prompt format. Is it unique to Wizard-Vicuna, or is it more common than that?
OpenAssistant also has special tokens like <|endoftext|>
that were manually added to the tokenizer. See here: https://huggingface.co/OpenAssistant/oasst-rlhf-2-llama-30b-7k-steps-xor/blob/main/oasst-rlhf-2-llama-30b-7k-steps-xor/added_tokens.json
I see. That's an XOR release, but I assume some the full releases I'm looking at are true enough to the original. It doesn't look like the tokenizer model is changed at all, so it's really all just spread across four different config files, with contradictions and all. Transformers is quite the framework..
Anyway, this makes me wonder if text-generation-webui makes any attempt at sanitizing user input, or if that's maybe just me overthinking things.
I believe I have a lead on this part of the issue:
Kobold's exllama = random seizures/outbursts, as mentioned
I managed to reproduce the problem with logging enabled and observed the following generation:
id gen RepetitionP TopP softmax
29892 24.8125 22.5628 22.5628 1 [,]
1183 18.3281 16.7737 -inf 0 [ she]
310 15.3672 14.1284 -inf 0 [ of]
322 15.0312 13.7184 -inf 0 [ and]
20265 1.68555 1.68555 -inf 0 [ Bened]
Selected: 20265 [ Bened]
Despite the comma being the only token with non-zero probability, KAI selected the Bened
token instead. The cause is a bug in torch.multinomial
that has been claim fixed but not yet released (i.e. it's still active in PyTorch 2.0.1). This bug sometimes causes the multinomial
function to select items with zero weight.
I worked around this bug in the KoboldAI exllama backend by checking the selected tokens and resampling when any zero probability tokens are chosen. I've verified with logging that this avoids the issue and in testing it seems to have solved all the problems with poor output quality.
I have noticed that while it massively increases the inference speed, it massively decreases the quality of the outputs, instruct models become very obstinate and give completely irrelevant responses, words become misspelled, it repeats lines over and over, and also sometimes spams Chinese letters