pkoukk / tiktoken-go

go version of tiktoken
MIT License
601 stars 67 forks source link

Merge `main` updates into `embed` version #29

Closed winston-stripe closed 1 year ago

winston-stripe commented 1 year ago

Merging the recent improvements of main into the embed version.

As an aside, it would be neat if there was some way in the normal package to provide the offline assets on init or something (e.g. having tiktoken-go and tiktoken-go/assets/go.mod) to avoid any weird constraints when tiktoken is included as a dependency by something else and we can't necessarily import the embed version of the library.

e.g. something like tiktoken.InitWithOfflineData(tiktoken_offline.Data()) 🤷

Thanks!

pkoukk commented 1 year ago

Actually, I think setting the environment variable to use cache is a better approach than embedding it. However, it's true that using cache can be unfriendly during the build and deployment. Thank you for your opinion. I will try to find a better and easier solution.

pkoukk commented 1 year ago

Thank you for your inspiration. Now we can accept custom BPE loaders.

31

winston-stripe commented 1 year ago

Thank you!