Open 1Q18LAqakl opened 1 week ago
I mean, after linking the tabby API with SillyTavern, I changed the characters in SillyTavern, and this issue occurred.
When you switch characters in SillyTavern, it sends a request to TabbyAPI's token encode endpoint to "measure" the number of tokens in the character card (which is then displayed at the top of the char card). This seems to be triggering an error in the encode function in the underlying exllamav2 library. I'd suspect an issue with the model's tokenizer.
Do you have a link to a repo containing the quantized model you're using?
Since it's bin, I use the convert_safetensors.py file in the util directory to convert it to safetensors and then quantize it.
It's most likely some jank with how the model author modified the tokenizer. Not really sure what the author did here - there was already a token 151643 <|endoftext|>
, and he decided to add another token 151646 <|end_of_text|>
yet the vocab size in config.json is unchanged, as this was already larger than the highest token ID used.
There is another model that is also like this,https://huggingface.co/cognitivecomputations/dolphin-2.9.2-qwen2-72b,I don't know if the model that has been modified twice is like this,How do I fix this?
This new model you linked doesn't seem to have this problem. The tokens that it has in added_tokens.json
are actually all just pre-existing tokens, so it shouldn't do anything.
Could that be my mistake? I'm sorry, I had the exact same problem a long time ago, but I forgot which model it was, and I remember it as if it was this model. So if this problem exists, how do I fix it, do I need to modify the json file?
If it's someone's mistake, it's probably the model author's unless you personally modified the files that were copied over during quantization.
I was going to suggest replacing all of the config/tokenizer files with the ones from the original Qwen2 72B repo and deleting added_tokens.json
, but this might just result in index out of range errors if the model is indeed trained on and attempts to output token ID 151646.
Can't I simply change the value in the added_tokens.json to fix this bug?
Change the value... of what exactly and to what exactly? Not sure I follow your logic here.
Fix this by modifying a file
Not sure exactly what to modify, unless you know something that I don't.
The error happens in this code - probably one of the entries in the array is coming back as None.
EDIT: Okay, I figured it out. Nothing to do with the added tokens. Open your config.json
and add "bos_token_id": 151643
as one of the entries in the JSON. This finetune was made using an old version of Qwen 2 that didn't have a BOS token set in the config, so a None
entry was being added to the array and then erroring when trying to convert to long.
I'll try it with text-generation-webui and Check whether the error is still reported,Wait for me.
Thank you, I don't know what needs to be changed because of the translation problem, can you please be more detailed?
I'm so excited to solve the problem, thank you, and have a nice day!
@1Q18LAqakl Replace your config.json with:
{
"_name_or_path": "/home/migel/Tess-v2.5-qwen2-72B-safetensors",
"architectures": [
"Qwen2ForCausalLM"
],
"attention_dropout": 0.0,
"bos_token_id": 151643,
"eos_token_id": 151645,
"hidden_act": "silu",
"hidden_size": 8192,
"initializer_range": 0.02,
"intermediate_size": 29568,
"max_position_embeddings": 131072,
"max_window_layers": 80,
"model_type": "qwen2",
"num_attention_heads": 64,
"num_hidden_layers": 80,
"num_key_value_heads": 8,
"rms_norm_eps": 1e-05,
"rope_theta": 1000000.0,
"sliding_window": 131072,
"tie_word_embeddings": false,
"torch_dtype": "bfloat16",
"transformers_version": "4.41.1",
"use_cache": false,
"use_sliding_window": false,
"vocab_size": 152064
}
Heh.. the BOS token of qwen2 IS "none". Look in the chat template. If you set a BOS token and it's used you will get worse outputs.
@Ph0rk0z That's correct, however in this situation it results in an error because None
gets added to the array and cannot be cast to torch.long. You can still generate without a BOS token perfectly fine. This code could probably be modified to something like:
if add_bos and self.bos_token_jd is not None: ids.insert(0, self.bos_token_id)
But iirc, this isn't the only part of the exllamav2 code that doesn't support the model not having a BOS token defined.
It's been resolved, thank you
Textgen is also silently losing metadata for EXL2 models over this. In tabby, somehow I didn't have this problem with magnum and I made sure to untick 'add bos token". Hopefully that is enough.
After I quantified the model migtissera_Tess-v2.5.2-Qwen2-72B to exl2, 6.0bpw, I used the SillyTavern link tabby API to replace the character card in SillyTavern. How did this problem come about, and how can I fix it?
ERROR: Exception in ASGI application ERROR: Traceback (most recent call last): ERROR: File "D:\Users\Administrator\anaconda3\lib\site-packages\uvicorn\protocols\http\httptools_impl.py", line 399, in run_asgi ERROR: result = await app( # type: ignore[func-returns-value] ERROR: File "D:\Users\Administrator\anaconda3\lib\site-packages\uvicorn\middleware\proxy_headers.py", line 70, in call ERROR: return await self.app(scope, receive, send) ERROR: File "D:\Users\Administrator\anaconda3\lib\site-packages\fastapi\applications.py", line 1054, in call ERROR: await super().call(scope, receive, send) ERROR: File "D:\Users\Administrator\anaconda3\lib\site-packages\starlette\applications.py", line 123, in call ERROR: await self.middleware_stack(scope, receive, send) ERROR: File "D:\Users\Administrator\anaconda3\lib\site-packages\starlette\middleware\errors.py", line 186, in call ERROR: raise exc ERROR: File "D:\Users\Administrator\anaconda3\lib\site-packages\starlette\middleware\errors.py", line 164, in call ERROR: await self.app(scope, receive, _send) ERROR: File "D:\Users\Administrator\anaconda3\lib\site-packages\starlette\middleware\cors.py", line 85, in call ERROR: await self.app(scope, receive, send) ERROR: File "D:\Users\Administrator\anaconda3\lib\site-packages\starlette\middleware\exceptions.py", line 65, in call ERROR: await wrap_app_handling_exceptions(self.app, conn)(scope, receive, send) ERROR: File "D:\Users\Administrator\anaconda3\lib\site-packages\starlette_exception_handler.py", line 64, in wrapped_app ERROR: raise exc ERROR: File "D:\Users\Administrator\anaconda3\lib\site-packages\starlette_exception_handler.py", line 53, in wrapped_app ERROR: await app(scope, receive, sender) ERROR: File "D:\Users\Administrator\anaconda3\lib\site-packages\starlette\routing.py", line 756, in call ERROR: await self.middleware_stack(scope, receive, send) ERROR: File "D:\Users\Administrator\anaconda3\lib\site-packages\starlette\routing.py", line 776, in app ERROR: await route.handle(scope, receive, send) ERROR: File "D:\Users\Administrator\anaconda3\lib\site-packages\starlette\routing.py", line 297, in handle ERROR: await self.app(scope, receive, send) ERROR: File "D:\Users\Administrator\anaconda3\lib\site-packages\starlette\routing.py", line 77, in app ERROR: await wrap_app_handling_exceptions(app, request)(scope, receive, send) ERROR: File "D:\Users\Administrator\anaconda3\lib\site-packages\starlette_exception_handler.py", line 64, in wrapped_app ERROR: raise exc ERROR: File "D:\Users\Administrator\anaconda3\lib\site-packages\starlette_exception_handler.py", line 53, in wrapped_app ERROR: await app(scope, receive, sender) ERROR: File "D:\Users\Administrator\anaconda3\lib\site-packages\starlette\routing.py", line 72, in app ERROR: response = await func(request) ERROR: File "D:\Users\Administrator\anaconda3\lib\site-packages\fastapi\routing.py", line 278, in app ERROR: raw_response = await run_endpoint_function( ERROR: File "D:\Users\Administrator\anaconda3\lib\site-packages\fastapi\routing.py", line 191, in run_endpoint_function ERROR: return await dependant.call(values) ERROR: File "D:\imat\tabbyAPI-main\endpoints\OAI\router.py", line 404, in encode_tokens ERROR: raw_tokens = model.container.encode_tokens(text, data.get_params()) ERROR: File "D:\imat\tabbyAPI-main\backends\exllamav2\model.py", line 763, in encode_tokens ERROR: self.tokenizer.encode( ERROR: File "D:\Users\Administrator\anaconda3\lib\site-packages\exllamav2\tokenizer\tokenizer.py", line 416, in encode ERROR: ids = torch.tensor(ids).to(torch.long).unsqueeze(0) ERROR: RuntimeError: Could not infer dtype of NoneType INFO: 127.0.0.1:63596 - "POST /v1/token/encode HTTP/1.1" 500 ERROR: Exception in ASGI application ERROR: Traceback (most recent call last): ERROR: File "D:\Users\Administrator\anaconda3\lib\site-packages\uvicorn\protocols\http\httptools_impl.py", line 399, in run_asgi ERROR: result = await app( # type: ignore[func-returns-value] ERROR: File "D:\Users\Administrator\anaconda3\lib\site-packages\uvicorn\middleware\proxy_headers.py", line 70, in call ERROR: return await self.app(scope, receive, send) ERROR: File "D:\Users\Administrator\anaconda3\lib\site-packages\fastapi\applications.py", line 1054, in call ERROR: await super().call(scope, receive, send) ERROR: File "D:\Users\Administrator\anaconda3\lib\site-packages\starlette\applications.py", line 123, in call ERROR: await self.middleware_stack(scope, receive, send) ERROR: File "D:\Users\Administrator\anaconda3\lib\site-packages\starlette\middleware\errors.py", line 186, in call ERROR: raise exc ERROR: File "D:\Users\Administrator\anaconda3\lib\site-packages\starlette\middleware\errors.py", line 164, in call ERROR: await self.app(scope, receive, _send) ERROR: File "D:\Users\Administrator\anaconda3\lib\site-packages\starlette\middleware\cors.py", line 85, in call ERROR: await self.app(scope, receive, send) ERROR: File "D:\Users\Administrator\anaconda3\lib\site-packages\starlette\middleware\exceptions.py", line 65, in call ERROR: await wrap_app_handling_exceptions(self.app, conn)(scope, receive, send) ERROR: File "D:\Users\Administrator\anaconda3\lib\site-packages\starlette_exception_handler.py", line 64, in wrapped_app ERROR: raise exc ERROR: File "D:\Users\Administrator\anaconda3\lib\site-packages\starlette_exception_handler.py", line 53, in wrapped_app ERROR: await app(scope, receive, sender) ERROR: File "D:\Users\Administrator\anaconda3\lib\site-packages\starlette\routing.py", line 756, in call ERROR: await self.middleware_stack(scope, receive, send) ERROR: File "D:\Users\Administrator\anaconda3\lib\site-packages\starlette\routing.py", line 776, in app ERROR: await route.handle(scope, receive, send) ERROR: File "D:\Users\Administrator\anaconda3\lib\site-packages\starlette\routing.py", line 297, in handle ERROR: await self.app(scope, receive, send) ERROR: File "D:\Users\Administrator\anaconda3\lib\site-packages\starlette\routing.py", line 77, in app ERROR: await wrap_app_handling_exceptions(app, request)(scope, receive, send) ERROR: File "D:\Users\Administrator\anaconda3\lib\site-packages\starlette_exception_handler.py", line 64, in wrapped_app ERROR: raise exc ERROR: File "D:\Users\Administrator\anaconda3\lib\site-packages\starlette_exception_handler.py", line 53, in wrapped_app ERROR: await app(scope, receive, sender) ERROR: File "D:\Users\Administrator\anaconda3\lib\site-packages\starlette\routing.py", line 72, in app ERROR: response = await func(request) ERROR: File "D:\Users\Administrator\anaconda3\lib\site-packages\fastapi\routing.py", line 278, in app ERROR: raw_response = await run_endpoint_function( ERROR: File "D:\Users\Administrator\anaconda3\lib\site-packages\fastapi\routing.py", line 191, in run_endpoint_function ERROR: return await dependant.call(values) ERROR: File "D:\imat\tabbyAPI-main\endpoints\OAI\router.py", line 404, in encode_tokens ERROR: raw_tokens = model.container.encode_tokens(text, data.get_params()) ERROR: File "D:\imat\tabbyAPI-main\backends\exllamav2\model.py", line 763, in encode_tokens ERROR: self.tokenizer.encode( ERROR: File "D:\Users\Administrator\anaconda3\lib\site-packages\exllamav2\tokenizer\tokenizer.py", line 416, in encode ERROR: ids = torch.tensor(ids).to(torch.long).unsqueeze(0) ERROR: RuntimeError: Could not infer dtype of NoneType