neologd / mecab-ipadic-neologd

Neologism dictionary based on the language resources on the Web for mecab-ipadic
Other
2.7k stars 288 forks source link

Easier way to get a dictionary #80

Open omae-muds opened 3 years ago

omae-muds commented 3 years ago

I'm not familiar with this project, so there may be solutions I don't know about.

Motivation

Recently, one of the ways to use MeCab in Python is to just pip install MeCab-Python3 and a dictionary. I want to install both MeCab-Python3 and mecab-ipadic-neologd from PyPI. However, so far this dictionary is only available via the system package manager or installer script.

Goal

I have two suggestions.

1. Make it possible to pip install mecab-ipadic-neologd

An easy way to use MeCab and good dictionary in Python. The most ideal, but harder than the other one.

In Python, we often build virtual environments based on lists of packages described in files like requirements.txt, Pipfile or pyproject.toml. Currently, this dictionary cannot be written in the list, and can only be installed in a way that affects outside the virtual environment (i.e. the system). This problem will be solved.

2. Releasing the latest dictionary zip via GitHub Actions

This is a simple way to satisfy people who want the dictionary data.

In Github Actions, run the equivalent of the commands described in the README, and release the generated dictionary as a zip file. After downloading and extracting the zip, it can be used like tagger = MeCab.Tagger("-r /dev/null -d ./dic/mecab-ipadic-neologd").

However, this repository is already producing releases every few years. If there were (for example) two zip releases every week, it would cause confusion with existing releases. Also, the automatic release may contain issues that were overlooked because they were not done manually. (e.g. Corrupted data, compression failure, etc. )


At first, I tried to implement 2 in my repository. (Personally, I wanted to learn GitHub Actions). But I am not experienced in this kind of thing, and I am confused about how to handle the license. Do I just put a copy of COPYING in the repository and zip, and mention it in the README?

If this issue is not in the scope of this project, I want to implement 2 personally. In that case, won't there be license problems with the above approach?

Rather, if you choose method 2 or something similar, I could help you, since I already have yaml created in a private repository.