issues
search
openai
/
tiktoken
tiktoken is a fast BPE tokeniser for use with OpenAI's models.
MIT License
12.48k
stars
856
forks
source link
issues
Newest
Newest
Most commented
Recently updated
Oldest
Least commented
Least recently updated
`o200k_base` pretokenizer - regex error?
#298
AmitMY
closed
6 months ago
2
GPT4o出现低级bug:发现最新token中的垃圾语料及实测GPT4o胡言乱语出现幻觉
#297
alexhmyang
closed
6 months ago
3
Or
#296
jacob121532
closed
6 months ago
0
gpt-4o tokenizer
#295
nxfi777
closed
6 months ago
1
A character is splited into two tokens
#294
kerlion
closed
6 months ago
1
I need tiktoken win32 python3.8 version, can anyone provide it?
#293
loveFeng
closed
6 months ago
1
Combining marks and indic vowel marks within words are being split breaking all indic languages and most languages except English and CJKs
#292
ajaykg
closed
6 months ago
4
Error
#291
pedromothe5
closed
6 months ago
0
Use a custom exception ValueError subclass for the special tokens warning
#290
simonw
opened
6 months ago
0
Custom tokenizer fails to encode despite characters being in mergeable_ranks
#289
afang-story
closed
1 month ago
3
Understanding the intended behaviour of `_encode_bytes`
#288
ashleyholman
opened
7 months ago
0
Exception has occurred: ConnectionError HTTPSConnectionPool(host='openaipublic.blob.core.windows.net', port=443): Max retries exceeded with url: /encodings/cl100k_base.tiktoken (Caused by NewConnectionError('<urllib3.connection.HTTPSConnection object at 0x000001F4D42B0EE0>: Failed to establish a new connection: [Errno 11001] getaddrinfo failed'))
#287
anithamudigoudar
closed
1 month ago
2
Tiktoken not installing on a macbook pro with m2 chip
#286
chaudhryna
closed
6 months ago
2
Optimize _byte_pair_merge function in BPE implementation
#284
naveens01
opened
7 months ago
0
how to convert qwen.tiktoken to tokenzier.model
#283
cloudyuyuyu
opened
7 months ago
0
SSLError: HTTPSConnectionPool(host='openaipublic.blob.core.windows.net', port=443): Max retries exceeded with url
#281
sijiashen
opened
7 months ago
8
Using offline: `.tiktoken` file gets deleted automatically on Linux
#279
nkilm
closed
3 months ago
4
Add handling for empty input text in encode method
#277
pratyakshagarwal
closed
6 months ago
1
Encode an empty string gives empty tokens
#276
flexwang
closed
7 months ago
2
K
#275
marseko
closed
7 months ago
0
How to find the token count of a prompt using meta/llama2-70b model
#274
pradeepdev-1995
opened
8 months ago
1
ImportError: cannot import name '_tiktoken' from partially initialized module 'tiktoken'
#273
sazirod
closed
1 month ago
2
minor fix
#272
igeni
closed
8 months ago
0
Remove dependency on requests
#271
tristan-jl
opened
8 months ago
0
Description of repository has a typo
#270
markusheimerl
closed
7 months ago
1
pinging tiktoken URL always fails, pinging api.openai.com always works
#269
distributev
opened
8 months ago
1
Create SECURITY.md
#268
Monkei49
closed
8 months ago
0
Enhanced Stability in Token Length Calculation for Whitespace Handling
#267
hvaria
closed
8 months ago
0
Create devcontainer.json
#266
79075
closed
8 months ago
0
Incorrect tokenization of "Elaborate"
#265
Sternbach-Software
closed
7 months ago
1
Can't install tiktoken==0.4.0 or tiktoken==0.5.1in Python 3.12
#264
JackObid
closed
8 months ago
2
TikTok-Unlock-master.zip
#263
79075
closed
8 months ago
0
Update README.md
#262
tic-top
closed
9 months ago
0
<|endoftext|>,Why can't ChatGPT recognize it?
#261
Kayce001
closed
7 months ago
1
In-code documentation update
#260
louisbrulenaudet
opened
9 months ago
0
Unable to route GET requests through proxy
#259
aevilevitch
opened
9 months ago
2
Add possessive quantifiers to avoid catastrophic backtracking
#258
l0rinc
closed
1 month ago
3
AttributeError: partially initialized module 'tiktoken' has no attribute 'get_encoding
#257
TimotejZavski
closed
9 months ago
1
Add turbo 16k model for Azure
#256
kartikagrawal2503
closed
9 months ago
1
Simplify byte_pair_merge
#255
hauntsaninja
closed
9 months ago
1
Inline custom mapping function in _byte_pair_merge
#253
hauntsaninja
closed
9 months ago
0
Avoid calling byte_pair_encode for existing tokens
#252
hauntsaninja
closed
9 months ago
0
Store tokens in u32 instead of usize
#251
hauntsaninja
closed
9 months ago
0
Enhancement: Add convenience token-counting functions to this package
#250
pamelafox
opened
9 months ago
4
Are new line characters separate tokens?
#249
GlassBeaver
closed
9 months ago
1
Adds caching to get_encoding to avoid repeatedly constructing Encodings
#248
tal7aouy
closed
9 months ago
1
added two new embedding model's encoding
#247
Praneet460
closed
9 months ago
11
Panic (stack overflow) when encoding a certain string
#245
Crazytieguy
opened
10 months ago
7
Junhyun/add upstage solar
#244
jhpark-upstage
closed
10 months ago
0
junhyun/add_upstage_solar
#243
jhpark-upstage
closed
10 months ago
0
Previous
Next