Closed SebastianZug closed 3 years ago
Just an FYI, you probably want to use d: de_DE
under Aspell. This is something I didn't fully understand when I first wrote pyspelling (the library under this action).
You have data files, for instance en.dat
. These you can specify with lang: en
. But when I want a variant of English, like en_US
, I really need to use d: en_US
.
aspell:
lang: en
d: en_US
So it is possible you need to specify:
aspell:
lang: de
d: de_DE
PySpelling needs lang
so it can compile your wordlist properly in Aspell (not so much in Hunspell), but d
is used to specify language variants.
I'm not sure if the image already has de
data file and dictionaries installed, but maybe that clears things up a bit.
Another thing I should add is that you may need to normalize Unicode content. I investigated here this very thing when I was asked about Czech, and it became clear to properly spellcheck and ignore words, the content fed to Aspell needed to be normalized. This can be done with existing PySpelling filters. More info can be found here: https://github.com/facelessuser/pyspelling/issues/134#issuecomment-770287042.
Dear @facelessuser,
thanks for the fast response and your hints! I changed my config file according to the proposed format, but it fails again due to the fact, that the docker file does not include the corresponding dictionaries.
I guess it is not necessary to install "all" dictonaries ... But how can I install additional packages aspell-de
or aspell-ru
inside a docker instance from GitHub Action yml file?
Many thanks!
@SebastianZug I am only the author of PySpelling, and mainly offer support for that portion of this action. I do not actually use this action and often just manually set up the action myself. @jonasbn can maybe answer your questions on how to get additional dictionaries into the image. Hopefully, there is an easy solution.
I mainly follow this repository to help people who are having issues with the PySpelling portion of this plugin as that is what I authored.
With that said, if you really need full control, and it is not something easy to do with this action, or you need a solution right away, you can try setting a spell check action with PySpelling manually, which I document here: https://facelessuser.github.io/pyspelling/#usage-in-ci. If you try this and run into issues, I can probably talk you through that.
Just to an FYI, I don't discourage the use of this action as it is great for those unfamiliar with Python. I just don't personally use it.
I originally wrote PySpelling to use in actions, and once I got it working, I had never considered creating a docker image to do so. This action sprung up after the fact and has been a great help to people unfamiliar with Python, but since I am familiar with Python, and already had a working solution, I never really saw a reason to change how I did things.
hi @SebastianZug
I will look into to it shortly and get back to you. I was working on something else, so I was not able to jump into the discussion earlier
@SebastianZug I am not aware of any way to dynamically install extra components into the Docker image. The main reason for having the action relying on a pre-built Docker image is time, it that rebuilding the Docker image on every run was not an effective approach in the long run.
The DockerHub image was introduced with release 0.5.0
Prior to this one of my projects took 1.28 seconds to complete, now it completes in 42 seconds.
Increasing the build time of the Docker image is with this strategy no longer a concern since, we do not build as part of running the action.
Which leads me to your issue. I will evaluate extending the language support to German (de_DE
). This will not increase the run time, only the build time and the size of the Docker image, well at some point it might influence the run time, since the image size might increase the load time - but I think we need quite a few more extensions before this become an issue.
I see the point. The integration of many languages would increase loading time. Hence, its more efficient to install the specific aspell-x
package afterwards. Can I sent an apt install aspell-x
to the started docker instance?
Hi @SebastianZug
I do not believe it would be possible to do what your suggest, but I would have to research some more.
I added all the aspell
dictionaries I could locate.
aspell-am
aspell-ar
aspell-bg
aspell-bn
aspell-br
aspell-ca
aspell-cs
aspell-cy
aspell-da
aspell-de
aspell-el
aspell-eo
aspell-es
aspell-et
aspell-eu
aspell-fa
aspell-fo
aspell-fr
aspell-ga
aspell-gl-minimos
aspell-gu
aspell-he
aspell-hi
aspell-hr
aspell-hsb
aspell-hu
aspell-hy
aspell-is
aspell-it
aspell-kk
aspell-kn
aspell-ku
aspell-lt
aspell-lv
aspell-ml
aspell-mr
aspell-nl
aspell-no
aspell-or
aspell-pa
aspell-pl
aspell-pt-br
aspell-pt-pt
aspell-ro
aspell-ru
aspell-sk
aspell-sl
aspell-ta
aspell-te
aspell-sv
aspell-tl
aspell-uk
aspell-uz
aspell-en
already being present.
This increased the size of the Docker image significantly (395 MB):
jonasbn/github-action-spellcheck latest 1b331977ffa7 3 hours ago 559MB
jonasbn/github-action-spellcheck 0.9.0 1d2315be5e30 2 weeks ago 164MB
jonasbn/github-action-spellcheck 0.9.1 1d2315be5e30 2 weeks ago 164MB
As stated previously I do not mind the longer build time, as long as it does not increase the run time, since this is build once, run many.
But perhaps some of the languages could be excluded for inclusion if need be.
Hi @SebastianZug
Finally got around to releasing 0.11.0, which includes support for German dictionaries. Let me know if you run into any issue. The support for german is experimental for now, since I am unsure of the implementation is the right approach, but for now it will work - if any changes are being implemented, I will do my best to continue support for german spell checking.
Hi rojopolis,
thanks for the cool implementation, very helpful to monitor a project! If I want to use another language, do I need to install the dictionary separately beforehand? A test for
failed with
Many thanks!
Sebastian