takuyaa / kuromoji.js

JavaScript implementation of Japanese morphological analyzer
832 stars 117 forks source link

Add webpack support #27

Closed you06 closed 6 years ago

you06 commented 6 years ago

Usage

webpack config

{
  test: /\.dat\.gz$/,
  use: {
    loader: 'url-loader',
    options: {
      limit: 1
    }
  }
}

demo

const kuromoji = require("kuromoji")
import dict from 'somepath/dict'

kuromoji.builder({dicPath: dict}).build(function (err, tokenizer) {
    // tokenizer is ready
    var path = tokenizer.tokenize("すもももももももものうち");
    console.log(path);
})

the dict import from dict/index.js will be look like

{
  "base.dat.gz": "7c8bbced46e88cdb77c9c66c9ca9fbcb.gz",
  "cc.dat.gz": "05321caff24f87d1bed64fe1d44576fc.gz",
  "check.dat.gz": "dcbeea0429520f5e669a75ff504241a7.gz",
  "tid_map.dat.gz": "ab259890529abb432a5c20aff4efb021.gz",
  "tid_pos.dat.gz": "6b89472ae7b079cc8cb6d5758356ff37.gz",
  "tid.dat.gz": "48d8e87b50f900b4795e55e9a70c2696.gz",
  "unk_char.dat.gz": "557c5cc25a480e1946150625face4c91.gz",
  "unk_compat.dat.gz": "da69ebce7400cc6ba01f5ce19d3108f1.gz",
  "unk_invoke.dat.gz": "6b5a7c42a945cbba596148fecc2d56b4.gz",
  "unk_map.dat.gz": "eda6e0354662ee169e817f5848ab56d4.gz",
  "unk_pos.dat.gz": "5986e78e268fa51e3e119511ec914dd9.gz",
  "unk.dat.gz": "9229f1b8c742cd15ff3229ed3700112a.gz"
}

For webpack support, I use BrowserDictionaryLoader.js and make a environment test. ( it seems node support only before )

var NodeDictionaryLoader = require("./loader/NodeDictionaryLoader");
var BrowserDictionaryLoader = require("./loader/BrowserDictionaryLoader");
var DictionaryLoader = undefined;

if (typeof window === 'undefined') {
    DictionaryLoader = NodeDictionaryLoader;
} else {
    DictionaryLoader = BrowserDictionaryLoader;
}
you06 commented 6 years ago

Oh I should not commit yarn.lock which atomically enable yarn in travis-ci. Node "4.4.7" is too old for yarn.

you06 commented 6 years ago

The dictionary file should in webpack static directory which will not parsed by any loader. So closed this PR.