huggingface / tokenizers

💥 Fast State-of-the-Art Tokenizers optimized for Research and Production
https://huggingface.co/docs/tokenizers
Apache License 2.0
8.69k stars 747 forks source link

Use NodeJs: Cannot find module 'tokenizers-darwin-arm64' #1403

Closed guotingchao closed 2 months ago

guotingchao commented 7 months ago

Step 1

After installing the tokenizers package with npm, I tried to call the test from the application class, and it returned the following message

Error: Cannot find module 'tokenizers-darwin-arm64'

Step 2

Then I tried to install this tokenizers-darwin-arm64 package The result returns the following information

Couldn't find any versions for "tokenizers-darwin-x64" that matches "0.13.3"
? Please choose a version of "tokenizers-darwin-x64" from this list: (Use arrow keys)

Step 3

Then I chose any version of 0.13.4-rc2

Error: Couldn't find package "tokenizers-freebsd-x64@0.13.3" required by "tokenizers@^0.13.3" on the "npm" registry.

Question :

Does it support MacOs with the M system-on-chip?

github-actions[bot] commented 6 months ago

This issue is stale because it has been open 30 days with no activity. Remove stale label or comment or this will be closed in 5 days.

ArthurZucker commented 6 months ago

Not sure we release new packages for npm recently

EasyVector commented 5 months ago

for any one trying to solve this issue, this script might be helpful, you could save this shell script into your project as "postinstall.sh"

# rust should be installed
if command -v rustc >/dev/null 2>&1; then
    echo "Rust is installed."
else
    echo "Rust is not installed."
    echo "Please install Rust from https://www.rust-lang.org/learn/get-started."
    exit 1
fi

DIR="tokenizer_folder"
if [ -d "$DIR" ]; then
  # Take action if $DIR exists. #
  echo "${DIR} EXISTS"
else
  echo "${DIR} not exists, try to clone"
  git clone https://github.com/huggingface/tokenizers tokenizer_folder --depth 1
fi

# try to find node
find_node_file() {
  local PATH_TO_SEARCH=$1
  local NODE_FILE=$(find "$PATH_TO_SEARCH" -name '*.node' -depth 1)
  if [[ -n "$NODE_FILE" ]]; then
    echo "$NODE_FILE"
  fi
}

# Save to variable
FOUND_NODE_FILE=$(find_node_file tokenizer_folder/bindings/node)
if [ ! -n "$FOUND_NODE_FILE" ]; then
  cd tokenizer_folder/bindings/node
  npm i
  npx yarn
  npx yarn build
  cd ../../../
fi

FOUND_NODE_FILE=$(find_node_file tokenizer_folder/bindings/node)
if [ -n "$FOUND_NODE_FILE" ]; then
  echo "$FOUND_NODE_FILE"
  cp "$FOUND_NODE_FILE" node_modules/tokenizers
fi

then add this into your package.json

...
  "scripts": {
    "postinstall": "sh postinstall.sh",
  },
...

then re run npm i, the binding file will be built & copied to project, and the error could be gone.

aaronclong commented 4 months ago

I'm getting similiar problems with GNU Linux:

Error: Cannot find module 'tokenizers-linux-x64-gnu'
  Require stack:
  - /home/xxx/xxx/xxxx/node_modules/tokenizers/index.js

  Error: Cannot find module 'tokenizers-linux-x64-gnu'
  Require stack:
  - /home/xxx/xxx/xxxx/node_modules/tokenizers/index.js
      at Module._resolveFilename (node:internal/modules/cjs/loader:1134:15)
      at Function._resolveFilename (/home/xxx/xxx/xxxx/node_modules/tsimp/src/hooks/require.ts:42:12)
      at Module._load (node:internal/modules/cjs/loader:975:27)
      at Module.require (node:internal/modules/cjs/loader:1225:19)
      at require (node:internal/modules/helpers:177:18)
      at Object.<anonymous> (/home/xxx/xxx/xxxx/node_modules/tokenizers/index.js:178:31)
      at Module._compile (node:internal/modules/cjs/loader:1356:14)
      at Module._extensions..js (node:internal/modules/cjs/loader:1414:10)
      at Module.load (node:internal/modules/cjs/loader:1197:32)
      at Module._load (node:internal/modules/cjs/loader:1013:12)

Update

I'm not even seeing the native modules in the installed npm package. They might not be getting copied. I could be misunderstanding this package's installation process, though. Screenshot from 2024-02-25 13-36-29

aaronclong commented 4 months ago

@ArthurZucker do you think a new build would fix this?

aaronclong commented 4 months ago

I think I found the issue, but please let me know if I am off base. The package.json 's files property seems to be missing all the binaries's names in it. I made a small pr to address this and some CI/CD necessary upgrades. https://github.com/huggingface/tokenizers/pull/1459

loretoparisi commented 3 months ago
# rust should be installed
if command -v rustc >/dev/null 2>&1; then
    echo "Rust is installed."
else
    echo "Rust is not installed."
    echo "Please install Rust from https://www.rust-lang.org/learn/get-started."
    exit 1
fi

Thanks @EasyVector this seems a solution, but aftering successfully building I get a a runtime error

/.../node_modules/tokenizers:1
����
SyntaxError: Invalid or unexpected token
    at Object.compileFunction (node:vm:360:18)
    at wrapSafe (node:internal/modules/cjs/loader:1126:15)
    at Module._compile (node:internal/modules/cjs/loader:1162:27)
    at Object.Module._extensions..js (node:internal/modules/cjs/loader:1252:10)
    at Module.load (node:internal/modules/cjs/loader:1076:32)
    at Function.Module._load (node:internal/modules/cjs/loader:911:12)
    at Module.require (node:internal/modules/cjs/loader:1100:19)
    at require (node:internal/modules/cjs/helpers:119:18)

This happens when loading the bindings in node like

const { SentencePieceBPETokenizer, BertWordPieceTokenizer, ByteLevelBPETokenizer, BPETokenizer } = require("tokenizers");
github-actions[bot] commented 2 months ago

This issue is stale because it has been open 30 days with no activity. Remove stale label or comment or this will be closed in 5 days.