Incorrect tokenization of "Elaborate"

BPE is only morphology aware to the extent implied by n-gram frequency. See https://github.com/openai/tiktoken#what-is-bpe-anyway and the tiktoken._educational submodule.

What's "correct" here is determined by what matches what OpenAI's models were trained with. What's desirable here is determined by the resulting performance of models. If you had a different tokenisation algorithm that is reasonably fast and improves language model performance, that would be an interesting research result :-)

openai / tiktoken

Incorrect tokenization of "Elaborate" #265