Strange tokenizer results in notebook tab

Describe the bug

I am useing the model MN-12B-Mag-Mell-Q8_0.gguf which has special tokens for chatml but I noticed this with other models too. Token 14 for example is <|im_start|>.

When I run llama.cpp server and querry the /tokenize endpoint manually on <|im_start|> I get the expected token 14, but in the notebook tab this is not the case:

I get the following tokens. Detokenizing with llama.cpp server also reveals that this does indeed also translate to the text <|im_start|>. Also this is the number of tokens counted in the main (Raw) notebook tab.

1      -  ''
1060   -  '<'
1124   -  '|'
1329   -  'im'
18993  -  '_start'
1124   -  '|'
1062   -  '>'

Is there an existing issue for this?

[X] I have searched the existing issues

Reproduction

Download a model with special chatml tokens like MN-12B-Mag-Mell or countless others. Type in the special chatml token in the notebook. Go to the Tokens tab and see that the special token is not generated.

Screenshot

No response

Logs

There are no error logs specific to this as far as I know.

System Info

Archlinux
Nvidia
Manual install directly from the git repo.

oobabooga / text-generation-webui