knights-analytics / hugot

Huggingface transformer pipelines in Golang
Apache License 2.0
279 stars 13 forks source link

`tokenizers` fork: Include upstream changes for platform dependent libs in CGO #33

Closed gregfurman closed 3 months ago

gregfurman commented 3 months ago

Hey 👋

Thanks for this library. I'm currently using it to build some inference tooling plugins for a stream processor.

Can the knights-analytics/tokenizer fork be updated to include PR/18 Update to allow for platform dependent libs in CGO?

Think this will make compilation a bit easier, especially for people like me who are on a Mac where /usr/lib is a protected directory as including that tokenizers_srcdir_relative is not really ideal when building.

https://github.com/daulet/tokenizers/blob/d9aff87d16f3db537ee005fb45ebca26049e7916/tokenizer.go#L6

RJKeevil commented 3 months ago

Hi @gregfurman , yes we are planning to move back to the base tokenizer project now that the project has configurable paths. We just need to contribute one PR back to the repo as we rely on having offsets for some of our pipeine types. I'll update once that is in.

RJKeevil commented 3 months ago

FYI PR is here https://github.com/daulet/tokenizers/pull/21

RJKeevil commented 3 months ago

This change is now part of the v0.1.4 release