rojopolis / spellcheck-github-actions

Spell check action
MIT License
138 stars 38 forks source link

Foreign languages #35

Closed SebastianZug closed 3 years ago

SebastianZug commented 3 years ago

Hi rojopolis,

thanks for the cool implementation, very helpful to monitor a project! If I want to use another language, do I need to install the dictionary separately beforehand? A test for

aspell:
    lang: de_DE # or de 

failed with

ERROR: README.md: html>body>h1 -- Runtime Error: Error: No word lists can be found for the language "de_DE".

Many thanks!

Sebastian

facelessuser commented 3 years ago

Just an FYI, you probably want to use d: de_DE under Aspell. This is something I didn't fully understand when I first wrote pyspelling (the library under this action).

You have data files, for instance en.dat. These you can specify with lang: en. But when I want a variant of English, like en_US, I really need to use d: en_US.

aspell:
    lang: en
    d: en_US

So it is possible you need to specify:

aspell:
    lang: de
    d: de_DE

PySpelling needs lang so it can compile your wordlist properly in Aspell (not so much in Hunspell), but d is used to specify language variants.

I'm not sure if the image already has de data file and dictionaries installed, but maybe that clears things up a bit.

facelessuser commented 3 years ago

Another thing I should add is that you may need to normalize Unicode content. I investigated here this very thing when I was asked about Czech, and it became clear to properly spellcheck and ignore words, the content fed to Aspell needed to be normalized. This can be done with existing PySpelling filters. More info can be found here: https://github.com/facelessuser/pyspelling/issues/134#issuecomment-770287042.

SebastianZug commented 3 years ago

Dear @facelessuser,

thanks for the fast response and your hints! I changed my config file according to the proposed format, but it fails again due to the fact, that the docker file does not include the corresponding dictionaries.

https://github.com/rojopolis/spellcheck-github-actions/blob/8b4229b661caddca553f2ca3a3874f55ea88fa90/Dockerfile#L11-L13

I guess it is not necessary to install "all" dictonaries ... But how can I install additional packages aspell-de or aspell-ru inside a docker instance from GitHub Action yml file?

Many thanks!

facelessuser commented 3 years ago

@SebastianZug I am only the author of PySpelling, and mainly offer support for that portion of this action. I do not actually use this action and often just manually set up the action myself. @jonasbn can maybe answer your questions on how to get additional dictionaries into the image. Hopefully, there is an easy solution.

I mainly follow this repository to help people who are having issues with the PySpelling portion of this plugin as that is what I authored.


With that said, if you really need full control, and it is not something easy to do with this action, or you need a solution right away, you can try setting a spell check action with PySpelling manually, which I document here: https://facelessuser.github.io/pyspelling/#usage-in-ci. If you try this and run into issues, I can probably talk you through that.


Just to an FYI, I don't discourage the use of this action as it is great for those unfamiliar with Python. I just don't personally use it.

I originally wrote PySpelling to use in actions, and once I got it working, I had never considered creating a docker image to do so. This action sprung up after the fact and has been a great help to people unfamiliar with Python, but since I am familiar with Python, and already had a working solution, I never really saw a reason to change how I did things.

jonasbn commented 3 years ago

hi @SebastianZug

I will look into to it shortly and get back to you. I was working on something else, so I was not able to jump into the discussion earlier

jonasbn commented 3 years ago

@SebastianZug I am not aware of any way to dynamically install extra components into the Docker image. The main reason for having the action relying on a pre-built Docker image is time, it that rebuilding the Docker image on every run was not an effective approach in the long run.

The DockerHub image was introduced with release 0.5.0

Prior to this one of my projects took 1.28 seconds to complete, now it completes in 42 seconds.

Increasing the build time of the Docker image is with this strategy no longer a concern since, we do not build as part of running the action.

Which leads me to your issue. I will evaluate extending the language support to German (de_DE). This will not increase the run time, only the build time and the size of the Docker image, well at some point it might influence the run time, since the image size might increase the load time - but I think we need quite a few more extensions before this become an issue.

SebastianZug commented 3 years ago

I see the point. The integration of many languages would increase loading time. Hence, its more efficient to install the specific aspell-x package afterwards. Can I sent an apt install aspell-x to the started docker instance?

jonasbn commented 3 years ago

Hi @SebastianZug

I do not believe it would be possible to do what your suggest, but I would have to research some more.

I added all the aspell dictionaries I could locate.

aspell-am
aspell-ar
aspell-bg
aspell-bn
aspell-br
aspell-ca
aspell-cs
aspell-cy
aspell-da
aspell-de
aspell-el
aspell-eo
aspell-es
aspell-et
aspell-eu
aspell-fa
aspell-fo
aspell-fr
aspell-ga
aspell-gl-minimos
aspell-gu
aspell-he
aspell-hi
aspell-hr
aspell-hsb
aspell-hu
aspell-hy
aspell-is
aspell-it
aspell-kk
aspell-kn
aspell-ku
aspell-lt
aspell-lv
aspell-ml
aspell-mr
aspell-nl
aspell-no
aspell-or
aspell-pa
aspell-pl
aspell-pt-br
aspell-pt-pt
aspell-ro
aspell-ru
aspell-sk
aspell-sl
aspell-ta
aspell-te
aspell-sv
aspell-tl
aspell-uk
aspell-uz

aspell-en already being present.

This increased the size of the Docker image significantly (395 MB):

jonasbn/github-action-spellcheck    latest               1b331977ffa7   3 hours ago     559MB
jonasbn/github-action-spellcheck    0.9.0                1d2315be5e30   2 weeks ago     164MB
jonasbn/github-action-spellcheck    0.9.1                1d2315be5e30   2 weeks ago     164MB

As stated previously I do not mind the longer build time, as long as it does not increase the run time, since this is build once, run many.

But perhaps some of the languages could be excluded for inclusion if need be.

jonasbn commented 3 years ago

Hi @SebastianZug

Finally got around to releasing 0.11.0, which includes support for German dictionaries. Let me know if you run into any issue. The support for german is experimental for now, since I am unsure of the implementation is the right approach, but for now it will work - if any changes are being implemented, I will do my best to continue support for german spell checking.