Closed aaronclong closed 5 months ago
@ArthurZucker could I be annoying and ask you to review this, please and thank you? I totally understand if this isn't a priority right now though.
@Narsil I see you are also an active maintainer in this repository. Could I bother you to take a look at this PR?
I'm sorry if I am being pushy at all, but would love to fix this problem for my own uses-cases
Wow, I seem to be running into the exact same issue -
Cannot find module 'tokenizers-linux-x64-gnu' from 'index.js'
at Resolver.resolveModule (node_modules/jest-resolve/build/index.js:169:17)
at Object.<anonymous> (node_modules/tokenizers/index.js:178:31)
Can someone help review this PR and merge it?
@anupamamurthi I'm going to give it a few days, but I might publish a fork until the maintainers start reviewing PRs again.
I'm wonder if this repo is being actively maintained. I've tried several avenues to get someone to review it and the others pending. If you know of anything @anupamamurthi, I'm all ears.
hey @aaronclong let me know if u publish a fork, I will be happy to give your node package a try....
@aaronclong do u have a node package ready/handy?
@aaronclong do u have a node package ready/handy?
@anupamamurthi I do not at the moment. I was planning to try and work on that this weekend when time allows
Giving this one more shot, @ArthurZucker or @Narsil would it be possible to review this PR? I notice another PR was merged only two days ago. I'm very open to any and all feedback. I just personally want to resolve this issue for my own needs. Thanks again and please let me know.
Getting the same error on both darwin
and linux
:
Error: Cannot find module 'tokenizers-darwin-arm64'
Require stack:
@Narsil any update on this issue? 🙏🏾
@anupamamurthi and @loretoparisi it still requires a lot of work, but I was able to publish a fork!!!
As time allows, I'm going to clean up the node code here. My future plan is to pull the tokenizer rust from cargo, and do the napi-rust binding in this repo. Likewise, I would love to try and do a WASM build.
However, you are welcoming to use it as it stands now. I can't speak to how functional it is because just focused on the build fixes.
https://github.com/turingscript/tokenizers https://www.npmjs.com/package/@turingscript/tokenizers
https://docs.npmjs.com/cli/v10/configuring-npm/package-json#files https://github.blog/changelog/2023-09-22-github-actions-transitioning-from-node-16-to-node-20/ https://github.com/actions-rs/toolchain https://github.com/huggingface/tokenizers/issues/1403
Background
The native modules don't seem packed and published with the node build. There seem to be missing entries in the package.json file. This PR adds all the permutations from the
index.js
file. However, this could be cleaned up with future glob patterns, "glob pattern (*, */, and such) will make it so that file is included in the tarball when it's packed".NPM Pack Results
Using the
npm pack
command, I could confirm the binaries are packed when running the build locally. Github doesn't allow the sharing of.tgz
files in PRs. However, the CLI output has been included below.PR's Pack Output
Main's Pack Output
Github Action Changes
In addition to all the above, I notice most of your actions for the release are deprecated. Github is switching to node20 for actions, and the legacy version might not fully support it.
One thing I couldn't update was the rust init step. The action,
actions-rs/toolchain
, being used is deprecated and archived.