openai / tiktoken

tiktoken is a fast BPE tokeniser for use with OpenAI's models.
MIT License
11.61k stars 784 forks source link

✨ feat: Split code rust, for run as python lib and rust lib #167

Open Miuler opened 1 year ago

Miuler commented 1 year ago

Split the code rust, for run as python lib and rust lib, to be able to publish in both crates and pypi.

Fixes #24

Miuler commented 1 year ago

Correcting sdist, they change this:

recursive-include src *.rs

for this:

recursive-include py-tiktoken *.rs
recursive-include rs-tiktoken *.rs
Miuler commented 1 year ago

Hi @hauntsaninja, this is a first version, without changes in the core, only separating the rust in 2, a rs-tiktoken thinking in creating the crate and another py-tiktoken that is the binding for python.

Then you can think about creating more versions like one for java something like jvm-tiktoken because it is really for you to use it from java, scala, etc, but first refine the core that should be in rs-tiktoken.

Miuler commented 1 year ago

@hauntsaninja do you think I should add more changes to this PR to have a fully functional rust version?

Miuler commented 1 year ago

The MANIFEST.in is fixed and I also separated the workflows for aarch64 into 2:

build_wheels_aarch64_glibc:

y

build_wheels_aarch64_musl:

that's the way it is:

Captura desde 2023-07-19 13-20-39

because I see that it is giving an error commits before my changes, this happens in arch64 but the versions that use musl instead of glibc.

Should we deactivate the musl? is there really a need to support aarch64 with musl ?

Miuler commented 1 year ago

What happen? Did I make a mistake? any comments? Is there anything I need to change in order for my PR to be considered?