Optimize encodeNative()

I was using this encoder for https://github.com/bakks/butterfish and found that the encoder scales very poorly as the string gets longer. I did a CPU profile (see image below) and found that the most expensive operation was this specific line: https://github.com/pkoukk/tiktoken-go/blob/03647ca046a15c80510e2a135a2dacdf31320ca1/core_bpe.go#L168

The encodeNative() function calls cutTextInRune() a bunch of times, which casts a string to []rune. That cast probably copies everything in memory, which for a small string is pretty negligible but if the string gets longer then it becomes more expensive. And this cast is being duplicated on the same string unnecessarily, so it currently scales superlinearly with string size, i.e. as the strings get longer it gets very very slow.

I made some changes and benchmarked the results. For a string of ~589000 bytes the old implementation basically never finishes, the new implementation takes about 0.2 seconds.

Please double check me and merge if useful.

pkoukk / tiktoken-go

Optimize encodeNative() #20