scriptin / jmdict-simplified

JMdict, JMnedict, Kanjidic, KRADFILE/RADKFILE in JSON format
Creative Commons Attribution Share Alike 4.0 International
186 stars 13 forks source link
dictionary dictionary-tools japanese japanese-language jmdict jmnedict json kanjidic kanjidic2 kradfile language radkfile xml

jmdict-simplified

JMdict, JMnedict, Kanjidic, and Kradfile/Radkfile in JSON format
with more comprehensible structure and beginner-friendly documentation

Download JSON files Format docs

NPM package: @scriptin/jmdict-simplified-types
NPM package: @scriptin/jmdict-simplified-loader


Why?

Original XML files are less than ideal in terms of format. (My opinion only, the JMdict/JMnedict project in general is absolutely awesome!) This project provides the following changes and improvements:

  1. JSON instead of XML (or custom text format of RADKFILE/KRADFILE). Because the original format used some "advanced" XML features, such as entities and DOCTYPE, it could be quite difficult to use in some tech stacks, e.g. when your programming language of choice has no libraries for parsing some syntax
  2. Regular structure for every item in every collection, no "same as in previous" implicit values. This is a problem with original XML files because users' code has to keep track of various parts of state while traversing collections. In this project, I tried to make every item of every collection "self-contained," with all the fields having all the values, without a need to refer to preceding items
  3. Avoiding null (with few exceptions) and missing fields, preferring empty arrays. See http://thecodelesscode.com/case/6 for the inspiration for this
  4. Human-readable names for fields instead of cryptic abbreviations with no explanations
  5. Documentation in a single file instead of browsing obscure pages across multiple sites. In my opinion, the documentation is the weakest part of JMDict/JMnedict project

Format

See the Format documentation or TypeScript types

Please also read the original documentation if you have more questions:

There are also Kotlin types, although they contain some methods and annotations you might not need.

Full, "common-only", and language-specific versions

There are two main types of JSON files for the JMdict dictionary:

Also, JMdict and Kanjidic have language-specific versions with language codes (3-letter ISO 639-2 codes for JMdict, 2-letter ISO 639-1 codes for Kanjidic) in file names:

JMnedict has only one version, since it's (currently) English-only, and has no "common" indicators on entries.

Requirements for running the conversion script

You don't need to install Gradle, just use the Gradle wrapper provided in this repository: gradlew (for Linux/Mac) or gradlew.bat (for Windows)

Converting XML dictionaries

NOTE: You can grab the pre-built JSON files in the latest release

Use included scripts: gradlew (for Linux/Mac OS) or gradlew.bat (for Windows).

Tasks to convert dictionary files and create distribution archives:

Utility tasks (for CI/CD workflows):

For the full list of available tasks, run ./gradlew tasks

Troubleshooting

License

JMdict and JMnedict

The original XML files - JMdict.xml, JMdict_e.xml, and JMnedict.xml - are the property of the Electronic Dictionary Research and Development Group, and are used in conformance with the Group's license. Project started in 1991 by Jim Breen.

All derived files are distributed under the same license, as the original license requires it.

Kanjidic

The original kanjidic2.xml file is released under Creative Commons Attribution-ShareAlike License v4.0. See the Copyright and Permissions section on the Kanjidic wiki for details.

All derived files are distributed under the same license, as the original license requires it.

RADKFILE/KRADFILE

The RADKFILE and KRADFILE files are copyright and available under the EDRDG Licence. The copyright of the RADKFILE2 and KRADFILE2 files is held by Jim Rose.

NPM packages

NPM packages @scriptin/jmdict-simplified-types and @scriptin/jmdict-simplified-loader are available under MIT license.

Other files

The source code and other files of this project, excluding the files and packages mentioned above, are available under Creative Commons Attribution-ShareAlike License v4.0. See LICENSE.txt