koehlma / pygtkspellcheck

A simple but quite powerful spellchecking library for GTK written in pure Python.
GNU General Public License v3.0
23 stars 9 forks source link

pylocales implementation #31

Closed jendrikseipp closed 2 years ago

jendrikseipp commented 8 years ago

Looking at the pylocales code, I wondered why it isn't in a separate repo. Also, it seems odd to use sql for the language queries. Wouldn't it be much easier to just use Python data structures?

koehlma commented 8 years ago

The original reason was just laziness, I think… So there is no well thought through reason for that…

However I would prefer shipping it bundled with pygtkspellcheck. Although I think using Git-Submodules and bundling it inside the gtkspellcheck namespace would make a lot of sense. Currently it somehow pollutes the global namespace, especially because it is undocumented.

What exactly do you mean by Python data structures? Maybe we could create a Python file containing all the data from the database as a dictionary?

jendrikseipp commented 8 years ago

I can understand that bundling the libraries makes the distribution easier. However, both libraries become harder to understand for users (and maintainers), since it is unclear how the two libraries interact and where the boundaries are. Also one can't reuse one library without also pulling the other library in. Additionally, a change in one library, might be totally unrelated to the other, which confuse users relying on stable interfaces.

FWIW, I don't think git submodules are the right solution here. If you think that pylocales could be useful for applications or libraries other than pygtkspellcheck, then pylocales should get its own github and pypi projects and pygtkspellcheck should get a dependency on pylocales. Otherwise, pylocales should become a subdirectory of gtkspellcheck.

Regarding locales.db, yes, I think having one or two Python dictionaries would simplify the code, the dependencies and the distribution. Also, currently it is quite hard to maintain the locales.db as it is a binary file. Turning it into text would make it possible to track it in git.

koehlma commented 8 years ago

I think it is useful on its own, but introducing an additional dependency merely for the purpose of better design and architecture to a mature project like Python GTK Spellcheck is not a good idea in my opinion. For example the package maintainers of Arch would have to add a new package, but currently no other package depends on Python GTK Spellcheck. They may decide that this is to much effort and remove the package completely from the repositories. Therefore pylocales should stay bundled with pygtkspellcheck, however by bundling I do not mean that there can be no pylocales package on its own. I just propose that pygtkspellcheck includes a copy of pylocales within its own namespace. Although I totally agree with you in terms of software architecture this is a complete mess. But merely design reasons are not enough to introduce new dependencies without other advantages to a four year old project IMHO.

As a temporary solution I suggest moving pylocales inside the gtkspellcheck namespace just to keep things clean but it will not become a part of the official API.

Regarding locales.db, actually this file is automatically generated using utils/locales/build.py. It should be fairly easy to change this script to generate a Python file containing a dictionary version of the database and adjust the remaining parts accordingly.

As a general remark, I am currently only maintaining this package and not actively developing anything any longer. Meaning that I myself will only handle pull requests and fix outstanding bugs. I will move the pylocales package inside the gtkspellcheck namespace when I have time but I would appreciate a pull request regarding locales.db.

jendrikseipp commented 8 years ago

Sounds good to me. My comments are just nice-to-haves, nothing we have to act on.

Now that I understand the functioning of locales.db a bit better, I think it's not good to have two copies of the same data in the repo, one in the xml files and one in locales.db. Ideally, I think the locales.db file should be built when the package is installed. This would allow to remove the binary locales.db from the repo.

Whether the codes are stored in locales.db or a Python dict is not so important after all, I guess.

koehlma commented 8 years ago

I had a look at the documentation again, pylocales appears there, therefore I can not simply move it without breaking the official (documented) API. I will postpone this for the 5.0 release.

jendrikseipp commented 7 years ago

I just found the pycountry project (https://pypi.python.org/pypi/pycountry). It seems pygtkspellcheck could use pycountry instead of copying the functionality. I'd recommend making this an optional dependency though: given a language code, if pycountry is present, lookup the country name, else display just the language code.

What do you think?

koehlma commented 2 years ago

The pylocales package has been removed with the upcoming 5.0 release.

It is still used internally. I would be happy to receive a PR switching to pycountry.