yunsii / fasttext.wasm.js

Node and Browser env supported WebAssembly version of fastText: Library for efficient text classification and representation learning.
https://fasttext-wasm-js.vercel.app
MIT License
7 stars 0 forks source link

skipgram predict throws an uncaught error #3

Open Abogical opened 1 month ago

Abogical commented 1 month ago

I have been using the package as a way to use more general fasttext features as it seems that the build from the official version doesn't work. I have trained a skipgram model through your package using the following script:

const ft = require('fasttext.wasm.js')
const fs = require('fs')

const ft2 = new (await ft.getFastTextClass({getFastTextModule: ft.getFastTextModule}))();
const trainCallback = (progress, loss, wst, lr, eta) => {
  console.log([progress, loss, wst, lr, eta]);
};

const trainingDataURI = URL.createObjectURL(
  new Blob([
    fs.readFileSync('./data.txt')
  ], {type: 'text/plain'})
)

const model = (await ft2.trainUnsupervised(trainingDataURI, 'skipgram', {
  'lr':0.1,
  'epoch':1,
  'loss':'ns',
  'wordNgrams':2,
  'dim':50,
  'bucket':200000
}, trainCallback))

There were no issues training the model and some methods of the model work as expected:

> model.getWordVector('deploy')
Float32Array(50) [
  0.013191827572882175,   0.14477281272411346,  0.03132225200533867,
  ...
]

However the predict method doesn't work and throws an uncaught WASM error.

> model.predict('deploy')
Uncaught 5679920
yunsii commented 1 month ago

There is no more error stach information? And I'm not familier with training logic, If you provide a minimal reproduction, then I can try to investigate it.

Abogical commented 1 month ago

@yuns You can try the code verbatim from the original issue. along with the following data file, although you could try any data file you want.

data.txt

yunsii commented 1 month ago

https://github.com/yunsii/fasttext.wasm.js/blob/8615449506a69b724b19891fc5f9b833de4be842/tests/training.ts

Unfortunately, run with npx tsx --trace-uncaught ./tests/training.ts throw error directly:

Read 0M words
Number of words:  0
Number of labels: 0

node:internal/process/esm_loader:40
      internalBinding('errors').triggerUncaughtException(
                                ^
120118296
Thrown at:
    at loadESM (node:internal/process/esm_loader:40:33)

Node.js v18.19.0

It seems too hard to investigate 😂

Abogical commented 1 month ago

@yunsii Sorry, it seems to work on only some datasets. I shared a trimmed version of mine. That's another bug.

However, you can try this dataset, it is working on my end: data2.txt

yunsii commented 1 month ago
Read 0M words
Number of words:  34
Number of labels: 0
args 0 -1 0 0.1 2592000
Progress: 100.0% words/sec/thread:  208397 lr:  0.000000 avg.loss:  4.117721 ETA:   0h 0m 0s
before getWordVector

node:internal/process/esm_loader:40
      internalBinding('errors').triggerUncaughtException(
                                ^
113056
Thrown at:
    at loadESM (node:internal/process/esm_loader:40:33)

Node.js v18.19.0

Strange error, I have no good idea to investigate 😂 it seems we need to debug in git submodule fasttext written in cpp, but I'm not familier with it.

Abogical commented 1 month ago

@yunsii Can you try running it in pure JavaScript? getWordVector is supposed to work.

yunsii commented 1 month ago

JavaScript is just a binding for fasttext wasm, there is no pure JavaScript to run.

Abogical commented 1 month ago

Sorry I meant not running it through tsx. I'm using the CommonJs system, not ESM.

yunsii commented 1 month ago

ESM and CJS are just binding js for wasm, it must throw error when call predict method, we chould not do anything in JS side 🫠