Closed VoidIsVoid closed 1 year ago
I find that there are some difference between official python and jtokkit.
In Java
final Encoding encodingForModel = registry.getEncodingForModel(ModelType.GPT_3_5_TURBO); final String s1 = "\u3000\u3000"; System.out.println(encodingForModel.encode(s1)); // [44529] final String s2 = "\u3000\u3000a"; System.out.println(encodingForModel.encode(s2)); // [44529, 64]
But in Python
# coding=utf-8 import tiktoken encoding = tiktoken.encoding_for_model('gpt-3.5-turbo') print(encoding.encode('\u3000\u3000')) # [44529] print(encoding.encode('\u3000\u3000a')) # [23249, 23249, 64]
Please fix it.
Thanks for the fix! :slightly_smiling_face: I published a new release, 0.5.1, containing your fix
I find that there are some difference between official python and jtokkit.
In Java
But in Python
Please fix it.