turboderp / exllamav2

A fast inference library for running LLMs locally on modern consumer-class GPUs
MIT License
3.18k stars 233 forks source link

There was a problem changing characters in the SillyTavern #517

Open 1Q18LAqakl opened 1 week ago

1Q18LAqakl commented 1 week ago

After I quantified the model migtissera_Tess-v2.5.2-Qwen2-72B to exl2, 6.0bpw, I used the SillyTavern link tabby API to replace the character card in SillyTavern. How did this problem come about, and how can I fix it?

ERROR: Exception in ASGI application ERROR: Traceback (most recent call last): ERROR: File "D:\Users\Administrator\anaconda3\lib\site-packages\uvicorn\protocols\http\httptools_impl.py", line 399, in run_asgi ERROR: result = await app( # type: ignore[func-returns-value] ERROR: File "D:\Users\Administrator\anaconda3\lib\site-packages\uvicorn\middleware\proxy_headers.py", line 70, in call ERROR: return await self.app(scope, receive, send) ERROR: File "D:\Users\Administrator\anaconda3\lib\site-packages\fastapi\applications.py", line 1054, in call ERROR: await super().call(scope, receive, send) ERROR: File "D:\Users\Administrator\anaconda3\lib\site-packages\starlette\applications.py", line 123, in call ERROR: await self.middleware_stack(scope, receive, send) ERROR: File "D:\Users\Administrator\anaconda3\lib\site-packages\starlette\middleware\errors.py", line 186, in call ERROR: raise exc ERROR: File "D:\Users\Administrator\anaconda3\lib\site-packages\starlette\middleware\errors.py", line 164, in call ERROR: await self.app(scope, receive, _send) ERROR: File "D:\Users\Administrator\anaconda3\lib\site-packages\starlette\middleware\cors.py", line 85, in call ERROR: await self.app(scope, receive, send) ERROR: File "D:\Users\Administrator\anaconda3\lib\site-packages\starlette\middleware\exceptions.py", line 65, in call ERROR: await wrap_app_handling_exceptions(self.app, conn)(scope, receive, send) ERROR: File "D:\Users\Administrator\anaconda3\lib\site-packages\starlette_exception_handler.py", line 64, in wrapped_app ERROR: raise exc ERROR: File "D:\Users\Administrator\anaconda3\lib\site-packages\starlette_exception_handler.py", line 53, in wrapped_app ERROR: await app(scope, receive, sender) ERROR: File "D:\Users\Administrator\anaconda3\lib\site-packages\starlette\routing.py", line 756, in call ERROR: await self.middleware_stack(scope, receive, send) ERROR: File "D:\Users\Administrator\anaconda3\lib\site-packages\starlette\routing.py", line 776, in app ERROR: await route.handle(scope, receive, send) ERROR: File "D:\Users\Administrator\anaconda3\lib\site-packages\starlette\routing.py", line 297, in handle ERROR: await self.app(scope, receive, send) ERROR: File "D:\Users\Administrator\anaconda3\lib\site-packages\starlette\routing.py", line 77, in app ERROR: await wrap_app_handling_exceptions(app, request)(scope, receive, send) ERROR: File "D:\Users\Administrator\anaconda3\lib\site-packages\starlette_exception_handler.py", line 64, in wrapped_app ERROR: raise exc ERROR: File "D:\Users\Administrator\anaconda3\lib\site-packages\starlette_exception_handler.py", line 53, in wrapped_app ERROR: await app(scope, receive, sender) ERROR: File "D:\Users\Administrator\anaconda3\lib\site-packages\starlette\routing.py", line 72, in app ERROR: response = await func(request) ERROR: File "D:\Users\Administrator\anaconda3\lib\site-packages\fastapi\routing.py", line 278, in app ERROR: raw_response = await run_endpoint_function( ERROR: File "D:\Users\Administrator\anaconda3\lib\site-packages\fastapi\routing.py", line 191, in run_endpoint_function ERROR: return await dependant.call(values) ERROR: File "D:\imat\tabbyAPI-main\endpoints\OAI\router.py", line 404, in encode_tokens ERROR: raw_tokens = model.container.encode_tokens(text, data.get_params()) ERROR: File "D:\imat\tabbyAPI-main\backends\exllamav2\model.py", line 763, in encode_tokens ERROR: self.tokenizer.encode( ERROR: File "D:\Users\Administrator\anaconda3\lib\site-packages\exllamav2\tokenizer\tokenizer.py", line 416, in encode ERROR: ids = torch.tensor(ids).to(torch.long).unsqueeze(0) ERROR: RuntimeError: Could not infer dtype of NoneType INFO: 127.0.0.1:63596 - "POST /v1/token/encode HTTP/1.1" 500 ERROR: Exception in ASGI application ERROR: Traceback (most recent call last): ERROR: File "D:\Users\Administrator\anaconda3\lib\site-packages\uvicorn\protocols\http\httptools_impl.py", line 399, in run_asgi ERROR: result = await app( # type: ignore[func-returns-value] ERROR: File "D:\Users\Administrator\anaconda3\lib\site-packages\uvicorn\middleware\proxy_headers.py", line 70, in call ERROR: return await self.app(scope, receive, send) ERROR: File "D:\Users\Administrator\anaconda3\lib\site-packages\fastapi\applications.py", line 1054, in call ERROR: await super().call(scope, receive, send) ERROR: File "D:\Users\Administrator\anaconda3\lib\site-packages\starlette\applications.py", line 123, in call ERROR: await self.middleware_stack(scope, receive, send) ERROR: File "D:\Users\Administrator\anaconda3\lib\site-packages\starlette\middleware\errors.py", line 186, in call ERROR: raise exc ERROR: File "D:\Users\Administrator\anaconda3\lib\site-packages\starlette\middleware\errors.py", line 164, in call ERROR: await self.app(scope, receive, _send) ERROR: File "D:\Users\Administrator\anaconda3\lib\site-packages\starlette\middleware\cors.py", line 85, in call ERROR: await self.app(scope, receive, send) ERROR: File "D:\Users\Administrator\anaconda3\lib\site-packages\starlette\middleware\exceptions.py", line 65, in call ERROR: await wrap_app_handling_exceptions(self.app, conn)(scope, receive, send) ERROR: File "D:\Users\Administrator\anaconda3\lib\site-packages\starlette_exception_handler.py", line 64, in wrapped_app ERROR: raise exc ERROR: File "D:\Users\Administrator\anaconda3\lib\site-packages\starlette_exception_handler.py", line 53, in wrapped_app ERROR: await app(scope, receive, sender) ERROR: File "D:\Users\Administrator\anaconda3\lib\site-packages\starlette\routing.py", line 756, in call ERROR: await self.middleware_stack(scope, receive, send) ERROR: File "D:\Users\Administrator\anaconda3\lib\site-packages\starlette\routing.py", line 776, in app ERROR: await route.handle(scope, receive, send) ERROR: File "D:\Users\Administrator\anaconda3\lib\site-packages\starlette\routing.py", line 297, in handle ERROR: await self.app(scope, receive, send) ERROR: File "D:\Users\Administrator\anaconda3\lib\site-packages\starlette\routing.py", line 77, in app ERROR: await wrap_app_handling_exceptions(app, request)(scope, receive, send) ERROR: File "D:\Users\Administrator\anaconda3\lib\site-packages\starlette_exception_handler.py", line 64, in wrapped_app ERROR: raise exc ERROR: File "D:\Users\Administrator\anaconda3\lib\site-packages\starlette_exception_handler.py", line 53, in wrapped_app ERROR: await app(scope, receive, sender) ERROR: File "D:\Users\Administrator\anaconda3\lib\site-packages\starlette\routing.py", line 72, in app ERROR: response = await func(request) ERROR: File "D:\Users\Administrator\anaconda3\lib\site-packages\fastapi\routing.py", line 278, in app ERROR: raw_response = await run_endpoint_function( ERROR: File "D:\Users\Administrator\anaconda3\lib\site-packages\fastapi\routing.py", line 191, in run_endpoint_function ERROR: return await dependant.call(values) ERROR: File "D:\imat\tabbyAPI-main\endpoints\OAI\router.py", line 404, in encode_tokens ERROR: raw_tokens = model.container.encode_tokens(text, data.get_params()) ERROR: File "D:\imat\tabbyAPI-main\backends\exllamav2\model.py", line 763, in encode_tokens ERROR: self.tokenizer.encode( ERROR: File "D:\Users\Administrator\anaconda3\lib\site-packages\exllamav2\tokenizer\tokenizer.py", line 416, in encode ERROR: ids = torch.tensor(ids).to(torch.long).unsqueeze(0) ERROR: RuntimeError: Could not infer dtype of NoneType

1Q18LAqakl commented 1 week ago

I mean, after linking the tabby API with SillyTavern, I changed the characters in SillyTavern, and this issue occurred.

DocShotgun commented 1 week ago

When you switch characters in SillyTavern, it sends a request to TabbyAPI's token encode endpoint to "measure" the number of tokens in the character card (which is then displayed at the top of the char card). This seems to be triggering an error in the encode function in the underlying exllamav2 library. I'd suspect an issue with the model's tokenizer.

Do you have a link to a repo containing the quantized model you're using?

1Q18LAqakl commented 1 week ago

yes,https://huggingface.co/migtissera/Tess-v2.5-Qwen2-72B

1Q18LAqakl commented 1 week ago

Since it's bin, I use the convert_safetensors.py file in the util directory to convert it to safetensors and then quantize it.

DocShotgun commented 1 week ago

It's most likely some jank with how the model author modified the tokenizer. Not really sure what the author did here - there was already a token 151643 <|endoftext|>, and he decided to add another token 151646 <|end_of_text|> yet the vocab size in config.json is unchanged, as this was already larger than the highest token ID used.

1Q18LAqakl commented 1 week ago

There is another model that is also like this,https://huggingface.co/cognitivecomputations/dolphin-2.9.2-qwen2-72b,I don't know if the model that has been modified twice is like this,How do I fix this?

DocShotgun commented 1 week ago

This new model you linked doesn't seem to have this problem. The tokens that it has in added_tokens.json are actually all just pre-existing tokens, so it shouldn't do anything.

1Q18LAqakl commented 1 week ago

Could that be my mistake? I'm sorry, I had the exact same problem a long time ago, but I forgot which model it was, and I remember it as if it was this model. So if this problem exists, how do I fix it, do I need to modify the json file?

DocShotgun commented 1 week ago

If it's someone's mistake, it's probably the model author's unless you personally modified the files that were copied over during quantization.

I was going to suggest replacing all of the config/tokenizer files with the ones from the original Qwen2 72B repo and deleting added_tokens.json, but this might just result in index out of range errors if the model is indeed trained on and attempts to output token ID 151646.

1Q18LAqakl commented 1 week ago

Can't I simply change the value in the added_tokens.json to fix this bug?

DocShotgun commented 1 week ago

Change the value... of what exactly and to what exactly? Not sure I follow your logic here.

1Q18LAqakl commented 1 week ago

Fix this by modifying a file

DocShotgun commented 1 week ago

Not sure exactly what to modify, unless you know something that I don't.

image

The error happens in this code - probably one of the entries in the array is coming back as None.

EDIT: Okay, I figured it out. Nothing to do with the added tokens. Open your config.json and add "bos_token_id": 151643 as one of the entries in the JSON. This finetune was made using an old version of Qwen 2 that didn't have a BOS token set in the config, so a None entry was being added to the array and then erroring when trying to convert to long.

1Q18LAqakl commented 1 week ago

I'll try it with text-generation-webui and Check whether the error is still reported,Wait for me.

1Q18LAqakl commented 1 week ago

Thank you, I don't know what needs to be changed because of the translation problem, can you please be more detailed?

1Q18LAqakl commented 1 week ago

I'm so excited to solve the problem, thank you, and have a nice day!

DocShotgun commented 1 week ago

@1Q18LAqakl Replace your config.json with:

{
  "_name_or_path": "/home/migel/Tess-v2.5-qwen2-72B-safetensors",
  "architectures": [
    "Qwen2ForCausalLM"
  ],
  "attention_dropout": 0.0,
  "bos_token_id": 151643,
  "eos_token_id": 151645,
  "hidden_act": "silu",
  "hidden_size": 8192,
  "initializer_range": 0.02,
  "intermediate_size": 29568,
  "max_position_embeddings": 131072,
  "max_window_layers": 80,
  "model_type": "qwen2",
  "num_attention_heads": 64,
  "num_hidden_layers": 80,
  "num_key_value_heads": 8,
  "rms_norm_eps": 1e-05,
  "rope_theta": 1000000.0,
  "sliding_window": 131072,
  "tie_word_embeddings": false,
  "torch_dtype": "bfloat16",
  "transformers_version": "4.41.1",
  "use_cache": false,
  "use_sliding_window": false,
  "vocab_size": 152064
}
Ph0rk0z commented 1 week ago

Heh.. the BOS token of qwen2 IS "none". Look in the chat template. If you set a BOS token and it's used you will get worse outputs.

DocShotgun commented 1 week ago

@Ph0rk0z That's correct, however in this situation it results in an error because None gets added to the array and cannot be cast to torch.long. You can still generate without a BOS token perfectly fine. This code could probably be modified to something like:

if add_bos and self.bos_token_jd is not None: ids.insert(0, self.bos_token_id)

But iirc, this isn't the only part of the exllamav2 code that doesn't support the model not having a BOS token defined.

1Q18LAqakl commented 1 week ago

It's been resolved, thank you

Ph0rk0z commented 1 week ago

Textgen is also silently losing metadata for EXL2 models over this. In tabby, somehow I didn't have this problem with magnum and I made sure to untick 'add bos token". Hopefully that is enough.