wooorm / dictionaries

Hunspell dictionaries in UTF-8
MIT License
1.21k stars 398 forks source link

ko: Word does not match #14

Closed heydojo closed 5 years ago

heydojo commented 5 years ago

I have been attempting to convert these dictionaries to qtwebengine format using qt's qwebengine_convert_dict tool.

I was unable to convert the file ko/index.bdic due to the following error:

Word does not match! - Index: 14081 - Expected: 김수한무거북이와두루미삼천갑자동방삭치치카포사리사리센타워리워리세브리캉무드셀라구름위허리케인에담벼락서생원에고양이고양이는바둑이바둑이는돌돌이 - Actual: 김수한무거북이와두루미삼천갑자동방� - ERROR converting, the dictionary does not check out OK.

Most other dictionaries did build using the tool so it leads me to believe that the fault may be with ko/index.bdic but as someone who is unfamiliar with hunspell, I am unable to tell if the Expected should be used to replace the actual. If so, it seems more useful to report it here than just fix it at my end.

wooorm commented 5 years ago

Could you check if the source dictionary, mentioned in the readme, has the same problem?

heydojo commented 5 years ago

I'm not setup to compile that dictionary here. Unfortunately. I'd need a link to both index.aff and index.dic to test.

Google translate also thinks the github repo is already in English, so:

hunspell spelling dictionary

It is a (currently only) Korean spelling dictionary that works in the widely used hunspell spell checking program on open source desktops.

It currently works with several Linux distributions and open source applications.

    Spell check word unit
    Implemented according to the characteristics of Korean as a ploy, including irregular usage and verb conjugation
    If a program uses the hunspell spell checker, the Korean spelling checker works without modification
    Use open source distributable word data

Github download (source and aff / dic files)
Using

To enable spell checking, you can install OS-specific packages, or use various application-specific extensions. Alternatively, you can use the dictionary file you built.

The dictionary file consists of two ko.aff files and two ko.dic files, and can be copied to the location where the hunspell dictionary is stored.

    Debian: hunspell-ko package installation, after 6.0 (squeeze), http://packages.debian.org/hunspell-ko
    Ubuntu: install hunspell-ko package, after 9.10 (karmic), http://packages.ubuntu.com/hunspell-ko
    Fedora: Installing the hunspell-ko package, after Fedora 12 (Constantine), https://apps.fedoraproject.org/packages/hunspell-en/
    Mac OS X 10.5: BaramSpellChecker
    Mac OS X 10.6 or later: Copy the aff / dic file under / Library / Spelling or your home folder / Library / Spelling
    Firefox add-on, https://addons.mozilla.org/en/firefox/addon/korean-spellchecker/
    LibreOffice add-on, https://extensions.libreoffice.org/extensions/korean-spellchecker

Firefox or LibreOffice, which is included in the Linux distribution, is built to use the hunspell that is usually included in the distribution. So if you want to install the packages included in the distribution, you do not need to install the above browser or office extension separately.
hunspell version information

The ko.dic file contains the heading data and the ko.aff (affix) file contains the word usage information. The aff file and the dic file are also used by myspell, the predecessor of the hunspell, but Korean data only works in the hunspell because it requires the functionality contained in the hunspell.

    All current features require hunspell version 1.3.1 or later.
    However, version 1.6.0 of hunspell has a serious problem in using Korean. Use 1.4.x or earlier or 1.6.1 or later.
    Most modern operating systems and applications use a version higher than 1.3.1, except that the hunspell built into Mac OS uses 1.2.8.
    You can also use the aff and dic files built for 1.2.8 in hunspell 1.2.8, but there are some parts that do not work.
    When using hunspell command line, the version before 1.2.11 does not work properly because it can not separate Korean words properly.
    Earlier versions of 1.6.2 have no problems with regular spell checking, but when used in the hunspell command line, the root starting with a '+' sign is not translated properly.

Copyright information

The copyright of this software lies with the hunspell-dict-ko project developers and contributors, dictionary word contributors, National Korean Language and Korean Sam dictionary contributors.

The generated code and some word data (dict-ko-galkwi-mplgpllgpl.json) are licensed under the Mozilla Public License 1.1, the GNU General Public License 2.0 or later, the GNU Lesser General Public License 2.1 or later (MPL 1.1 / GPL 2.0 / LGPL 2.1) will be distributed under a triple license. Other word data will be distributed under the CreativeCommons Attribution-ShareAlike 4.0 (CC BY-SA 4.0) license.
Reference

    Project Information
    Github Project
    Rake Word Info Site (where you can add / update missing words)
    dialogue

Other information

    The project was backed by the Open SW Developer Lab Global Open Frontier Program, which is supported by the Ministry of Science, Technology and MIS in 2017. https://kosslab.kr

It looks like they had some problems with certain hunspell versions.

wooorm commented 5 years ago

I'm not setup to compile that dictionary here. Unfortunately. I'd need a link to both index.aff and index.dic to test.

You can get the files from their Releases page!

wooorm commented 5 years ago

@heydojo Ping!

wooorm commented 5 years ago

Seems to be an issue with the source dictionary or with qwebengine_convert_dict. Not sure what to do in this repo 🤷‍♂️

OctopusET commented 3 years ago

Maintainer said that word is just easter egg. remove that line(line 14265) from ko.dic and try again.

edit: This seems qwebengine_convert_dict tool problem with word length. Not dictionary problem.

OctopusET commented 3 years ago

ping! @wooorm @heydojo