Czech translation - Githubissues

smrky1 commented 9 months ago

Added Czech translation for the Base game, Intrigue, Dark Ages, Seaside. Required a couple more changes:

"cz" is not defined in ISO 639-1. Changing "cz" to "cs", including folders and db files rename.
ReportLab only supports ISO-8859-1 (Latin1) encoding. Czech language (and others) are not supported in Latin1. Added code in draw.py which loads TTF version of Times Roman font when a non Latin1 language is selected in options
Added Times Roman TTF requirement for non Latin1 languages to README.md
Pirate Ship (cz) contains a new +X coins element. Added corresponding icon to image resources.
Updated CardSorter to use locale based sorting (instead of the previous "strip_accents()" method). Fixes a bug in the previous version which resulted in e.g. Czech language cards to be sorted incorrectly. Correct CZ sorting ("sa" > "sc" > "šb > "ta") vs incorrect ("sa" > "šb" > "sc" > "ta"). Tested in other languages and works OK even in German (de).

sumpfork commented 9 months ago

Thanks! I'll try to get to testing and reviewing this weekend.

sumpfork commented 9 months ago

As a note right away, it does look like you need to pre-commit install and then pre-commit run --all-files.

Also, look at the failing card db compile tests?

smrky1 commented 9 months ago

I have run pre-commit run --all-files to lint and test & committed the updates. So far it seems that Compile Card DB / pytest -k "test_languages" runs fine on my machine as well...

sumpfork commented 9 months ago

I have run pre-commit run --all-files to lint and test & committed the updates. So far it seems that Compile Card DB / pytest -k "test_languages" runs fine on my machine as well...

Looks like the set_locale call isn't working in tests: https://github.com/sumpfork/dominiontabs/actions/runs/7857961555/job/21442564732?pr=499

sumpfork commented 9 months ago

I have run pre-commit run --all-files to lint and test & committed the updates. So far it seems that Compile Card DB / pytest -k "test_languages" runs fine on my machine as well...

Looks like the set_locale call isn't working in tests: https://github.com/sumpfork/dominiontabs/actions/runs/7857961555/job/21442564732?pr=499

Yeah, this needs at least the full locale name (language_REGION) like de_DE which we don't provide by default other than in en_US but even there not capitalized properly. It's maybe a good idea to change all the locales to be such actual valid locale strings?

smrky1 commented 9 months ago

Seems it is a bit more complicated than that :/ All tests pass OK on my Windows machine (running Python 3.8.18, so same as the test job). The root cause seems to be related to Docker Ubuntu image for the GitHub Action job not having the necessary locales available / set. Together with using incorrect language codes (as you pointed out). Surprisingly the default language codes work on Windows as I wrote above...

I will see if I can figure out some workaround, if not I will revert the locale based sorting back to at least have the rest of the PR working....

Some interesting reading: https://stackoverflow.com/questions/14547631/python-locale-error-unsupported-locale-setting https://stackoverflow.com/questions/28405902/how-to-set-the-locale-inside-a-debian-ubuntu-docker-container https://bobbyhadz.com/blog/locale-error-unsupported-locale-setting-in-python#solving-the-error-in-docker

smrky1 commented 9 months ago

I think I managed to figure it out. Required 3 fixes: 1) Adding all necessary locale languages to the Workflows (I added them to compile_card_db.yml and lint_and_test.yml) 2) Also OS dependent locale.setlocale switch in main.py. 3) Additionally I had to rename "nl_du" to "nl_nl" to match a correct language code

All tests should pass now (tested the Workflows on my local machine using act). Please take a look if it is acceptable. Maybe there is a better way, I have to admit this is my first time working with GitHub Workflows. If not, I suggest reverting back to the old non locale based sorting (which will however not sort the cards correctly in all languages) and only commit the Czech translations.

smrky1 commented 9 months ago

Added some error handling for when the locale cannot be set (mainly on Linux systems). In that case it will generate a warning and revert to default locale based warning. I think this together with my previous commit is finally a "safe" solution. I also added explanation to Installation section of README.md.

The locale-gen updates in compile_card_db.yml and lint_and_test.yml (from my previous commit) are strictly speaking no longer necessary to pass the tests. However I decided to keep them there to avoid the warnings.

sumpfork commented 9 months ago

This works for me locally now. I'm guessing you maybe need sudo?

smrky1 commented 9 months ago

Right - as you say, for whatever reason it works without sudo when using act locally, but not on GitHub servers... Added sudo to generate necessary locales...

sumpfork commented 9 months ago

So I'd like to land this, but I'm getting

dominion_dividers --language=cs
** Warning: --tab-side with 'alternate' implies 2 tabs. Setting --tab-number to 2 **
fish: Job 1, 'dominion_dividers --language=cs' terminated by signal SIGSEGV (Address boundary error)

on my mac. Same in one of the units tests. Will try to figure out what's going on.

smrky1 commented 9 months ago

Sorry, I cannot help with that much - do not have Mac :-/ Everything works OK on my Windows 10 and Ubuntu 22.04...

sumpfork commented 9 months ago

This seems to have something to do with not specifying an encoding in the setlocale() call. If I use locale.setlocale(locale.LC_COLLATE, (lang, "UTF-8")) this works on my machine. Otherwise it runs fine through a bunch of comparisons and then suddenly crashes on the second time trying to get a comparison string for Doctor in the cs locale. Default seems to be ISO8859-2for that - really not sure why that would lead to a crash, but can you confirm that this works for sorting for you and change it to UTF-8?

smrky1 commented 9 months ago

Thanks for the UTF-8 suggestion, after some reading, defaulting to UTF-8 seems to be a good idea. I implemented the changes & updated the documentation. On both Windows and Ubuntu everything works fine - but I cannot test on Mac...

To generate the correct locale on Ubuntu the following is the right command: sudo locale-gen xx_XX.UTF-8. But I have no idea how to generate the required UTF-8 locale on macOS - can you suggest? Maybe it is worth adding that to the documentation? I have been trying to run macOS in VMware since yesterday with no luck...

smrky1 commented 9 months ago

But I have no idea how to generate the required UTF-8 locale on macOS - can you suggest?

I have finally managed to run macOS in VM. Seems all the necessary UTF-8 locales are installed by default, so no need to run anything similar to sudo locale-gen xx_XX.UTF-8.

However I found a new problem - all languages work fine except for "es", which crashes with Fatal Python error: Segmentation fault. Can you try on your setup? Testing in VM is a bit cumbersome ☹️

dominion_dividers --expansions dominion1stEdition -l es crashes, while e.g. dominion_dividers --expansions seaside1stEdition -l es does not 😕

smrky1 commented 8 months ago

After some debugging, I narrowed the crash down again to the locale.setlocale(locale.LC_COLLATE, (lang, "UTF-8")) line. Works OK for all languages, but crashes with "es_ES.UTF-8" on macOS for whatever reason. Running the same line directly in Python shell works without any problems...

I think it might be related to the setlocale function not being thread safe or something. It relies on an underlying OS call which obviously behaves differently on every OS.

Can you test on your macOS setup to check the problem is not related to my macOS running in VMware VM?

sumpfork commented 8 months ago

Sadly, confirmed to crash on a real Mac with es language setting, too.

I haven't checked, but as I mentioned, before it was not crashing on that line, but rather at some point later during actual formation of comparison strings.

nickv2002 commented 8 months ago

I haven't tested all the steps and settings you all have been back and forth on but this all seems like a good idea to containerize the application. That way people can run with Docker and get the same results no matter their OS/Version. I can take that up as a little side project if you all think that is a good idea.

sumpfork commented 8 months ago

I haven't tested all the steps and settings you all have been back and forth on but this all seems like a good idea to containerize the application people can run with Docker and get the same results no matter their OS/Version. I can take that up as a little side project if you all think that is a good idea.

Yeah, or add this to CI, haven't checked github actions but other CI systems I'm familiar with support testing on different platforms...

smrky1 commented 8 months ago

I think way forward is to finally give up on the OS dependent setlocale() call...

This article is how I started the cards sorting implementation. Apart from setlocale(), the best solution seems to be pyICU, but sounds like it is quite difficult to set up (and potentially OS dependent again).

IMHO the second best way to sort alphabetically is to use pyuca library. It will not work 100% correct for languages like Czech or Polish (still better than the default sort), but is easily available and portable.

If you agree, I will drop the setlocale() and replace with pyuca based sorting...

sumpfork commented 8 months ago

I think way forward is to finally give up on the OS dependent setlocale() call...

This article is how I started the cards sorting implementation. Apart from setlocale(), the best solution seems to be pyICU, but sounds like it is quite difficult to set up (and potentially OS dependent again).

IMHO the second best way to sort alphabetically is to use pyuca library. It will not work 100% correct for languages like Czech or Polish (still better than the default sort), but is easily available and portable.

If you agree, I will drop the setlocale() and replace with pyuca based sorting...

Sounds great to me if you got the time!

smrky1 commented 8 months ago

Good news :) I went through the locale documentation more carefully and followed the provided example. Found out that setting the locale permanently probably affects some other functionality which eventually leads to crashes. The example shows a nice workaround: storing the current locale setting and restoring it back after you utilize the new, temporary locale setting for sorting. Not ideal, but after I restoring the original locale back, it works!

Currently the tests pass on all Windows 10, Ubuntu LTS 22.04 and macOS 14.0 😉

Screenshot 2024-02-20 22 56 11

smrky1 commented 8 months ago

I did some more research and found out the locale based sort does not work correctly on macOS anyway (even for languages that do not crash). For example, Czech language is sorted incorrectly even if locale based sorting is used.

Seems to be a known issue on macOS (see here or here). The Czech collation table is symlinked to another collation table (Latin?), creating a language insensitive sort in the end: ls -al /usr/share/locale/cs_CZ/LC_COLLATE -> ../la_LN.US-ASCII/LC_COLLATE.

After spending many hours trying to figure the correct language aware cards sorting out, the options as I see them now:

Keep setlocale() for Windows and Ubuntu, but disable this feature on macOS since it does not work correctly anyway (and causes random crashes). The advantage here is that on Win and Ubuntu the users would get correct sort order without any additional dependencies.
Switch to pyuca: easy to implement, works OK for some languages, but results in incorrect sort order in languages with many accents (Czech, Polish, ...). In particular since now we have Czech translation, the Dominion dividers are generated in incorrect order using pyuca based sort. Will potentially cause similar issues for other languages if they are ever added to dominiontabs...
Switch to pyuca + add czech_sort library specifically for Czech language sorting. Again, quite easy to implement and it immediately fixes the Czech language issues. On the other hand it is not future proof if other languages are added later...
Last resort: add pyICU dependency. This will produce the best sort results, but binaries need to be compiled and other dependencies downloaded on macOS and Windows, making the dominiontabs setup much more complicated (on Ubuntu it is easy to get pyICU with apt-get install python3-icu).

@sumpfork please let me know your thoughts, I personally would vote for option 1 or 3...

smrky1 commented 8 months ago

Just for the reference, the potentially best / safest option 4 (pyICU), would require the following setup on each respective platform.

These are the required extra steps I obviously want to avoid (they would have to be added to Documentation as prerequisites) 😕

macOS

# instal Homebrew
/bin/bash -c "$(curl -fsSL https://raw.githubusercontent.com/Homebrew/install/HEAD/install.sh)"

# add Homebrew to PATH
(echo; echo 'eval "$(/usr/local/bin/brew shellenv)"') >> ~/.zprofile
eval "$(/usr/local/bin/brew shellenv)"

# install libicu (keg-only)
brew install pkg-config icu4c

# let setup.py discover keg-only icu4c via pkg-config
export PATH="/usr/local/opt/icu4c/bin:/usr/local/opt/icu4c/sbin:$PATH"
export PKG_CONFIG_PATH="$PKG_CONFIG_PATH:/usr/local/opt/icu4c/lib/pkgconfig"

# EITHER - when using a gcc-built CPython (e.g. from Homebrew)
export CC="$(which gcc)" CXX="$(which g++)"
# OR - when using system CPython or another clang-based CPython, ensure system clang is used (for proper libstdc++ https://gitlab.pyicu.org/main/pyicu/issues/5#issuecomment-291631507):
unset CC CXX

# avoid wheels from previous runs or PyPI
pip install --no-binary=:pyicu: pyicu

Ubuntu

sudo apt-get update
sudo apt-get install python3-icu

Windows

Download latest pyICU wheel file from https://github.com/cgohlke/pyicu-build/releases:

cp312 for Python 3.12, cp39 for Python 3.9, etc.
win_amd64 for Windows 64-bit

Install the wheel on the command line, for example for Python 3.12 64-bit py.exe -3.12 -m pip install PyICU-2.12-cp312-cp312-win_amd64.whl

nickv2002 commented 8 months ago

Thanks for the detailed write up of those options @smrky1

Since we have a dockerfile now, IMO the best option would be 4. We can easily install python-icu inside docker since it's Linux and then have an easy and consistent experience for Mac/Windows (and Linux) users by just directing them to use the containerized version of the CLI.

sumpfork commented 8 months ago

Thanks for the detailed write up of those options @smrky1

Since we have a dockerfile now, IMO the best option would be 4. We can easily install python-icu inside docker since it's Linux and then have an easy and consistent experience for Mac/Windows (and Linux) users by just directing them to use the containerized version of the CLI.

Remember we need to make sure that all of this also works for the online version running in AWS Lambda, which is the access point for most (almost all) users. It's probably not too hard to get that working, but may mean additional work on my side.

I'm leaning towards just making pyicu an optional dependency that we use if present, current behaviour if not.

To note, I'm also travelling for work the next week or so so may be slow to respond.

nickv2002 commented 8 months ago

Remember we need to make sure that all of this also works for the online version running in AWS Lambda, which is the access point for most (almost all) users. It's probably not too hard to get that working, but may mean additional work on my side.

Makes sense: you can have Lambdas run arbitrary images, so some configuration changes needed but should work fine. Might be able to install the pyicu at runtime also but it's probably better to pre-build it.

I'm leaning towards just making pyicu an optional dependency that we use if present, current behaviour if not.

That seems reasonable, we can bake it in to the docker image for easy access if folks need it (and avoid all the install hassles that @smrky1 documented above). I'll make the dockerfile change in this branch.

To note, I'm also traveling for work the next week or so so may be slow to respond.

Safe travels!

sumpfork commented 8 months ago

Remember we need to make sure that all of this also works for the online version running in AWS Lambda, which is the access point for most (almost all) users. It's probably not too hard to get that working, but may mean additional work on my side.

Makes sense: you can have Lambdas run arbitrary images, so some configuration changes needed but should work fine. Might be able to install the pyicu at runtime also but it's probably better to pre-build it.

Well, you can't run arbitrary images, just ones based on supported lambda runtimes or close matches. Plus, the function I use is currently an automatically generated one via CDK's python lambda support, not a custom docker lambda. I can switch it, but again, a bit of work :).

So yeah, I'd say 1) support pyicu if we find it's installed (probably just a from icu import Locale in a try/except with a warning that it isn't/can't be imported if that fails) 2) support pyicu locally via docker 3) I'll update the web version to have pyicu installed when I find the time

smrky1 commented 8 months ago

OK, agreed! I will take care of 1) in the next few days.

@nickv2002 While you're at it... I checked you dockerfile (being completely new to Docker), and it is not clear to me what line 4 does: COPY --from=pacodrokad/fonts /fonts /fonts

Is pacodrokad/fonts correct? Seems to me like a path specific to your setup?

nickv2002 commented 8 months ago

Well, you can't run arbitrary images, just ones based on supported lambda runtimes or close matches. Plus, the function I use is currently an automatically generated one via CDK's python lambda support, not a custom docker lambda. I can switch it, but again, a bit of work :).

Yeah lots of complexity details there, I was being overly general and you know the implementation details much better. (Just for ref, here the feature I was thinking of and a python example.)

So yeah, I'd say

support pyicu if we find it's installed (probably just a from icu import Locale in a try/except with a warning that it isn't/can't be imported if that fails)

support pyicu locally via docker

I'll update the web version to have pyicu installed when I find the time

Sounds good to me.

@nickv2002 While you're at it... I checked you dockerfile (being completely new to Docker), and it is not clear to me what line 4 does: COPY --from=pacodrokad/fonts /fonts /fonts

Is pacodrokad/fonts correct? Seems to me like a path specific to your setup?

That's a way to get the fonts into the image by copying them from somewhere else (in this case another image) rather than hosting fonts in this repo.

smrky1 commented 8 months ago

That's a way to get the fonts into the image by copying them from somewhere else

Might be worth adding to the documentation information on how to set it up correctly then.

When I run docker build . -t dominiontabs according to the instructions, it fails:

 => [internal] load build definition from dockerfile                                                                                                                                                                                                                               0.0s
 => => transferring dockerfile: 806B                                                                                                                                                                                                                                               0.0s
 => ERROR [internal] load metadata for docker.io/pacodrokad/fonts:latest                                                                                                                                                                                                           1.3s
 => [internal] load metadata for docker.io/library/python:3.9-slim                                                                                                                                                                                                                 1.2s
------
 > [internal] load metadata for docker.io/pacodrokad/fonts:latest:
------
dockerfile:4
--------------------
   2 |
   3 |     # get fonts
   4 | >>> COPY --from=pacodrokad/fonts /fonts /fonts
   5 |
   6 |     # Add git for hooks
--------------------
ERROR: failed to solve: pacodrokad/fonts: no match for platform in manifest: not found

nickv2002 commented 8 months ago

@smrky1 thanks for that note, it was an arch mismatch (that line is just pulling files, but it still looks for the image of the same arch, I fixed by telling it the specific arch to use). Fixed and merged over here: https://github.com/sumpfork/dominiontabs/pull/503 along with adding python3-icu

Fix Arch Mismatch identified here

Adding python3-icu as discussed here

smrky1 commented 8 months ago

Added optional PyICU based sorting + removed the locale related stuff. If PyICU is not found, the script will revert to the default method (using strip_accents()).
I also updated the README.md documentation (+ created a new page on Wiki with instructions on how to install PyICU)
To be able to test the PyICU integration in CI, the workflows will use PyICU on Ubuntu runner (but not on Win or macOS)
I notice that Czech expansions description text had some wrong formatting which caused --expansion_dividers switch to fail with error, this is now fixed

smrky1 commented 8 months ago

FYI, I completed the translation of all expansions available in Czech language translation). In the process I found out that the original English set has wrong Peddler description: "+2 Coin" -> should be "+1 Coin". Will rename the pull request title to reflect this.

sumpfork commented 8 months ago

Just getting back to this, will review before I go on my next trip next week.

smrky1 commented 7 months ago

Only today I found some time to review your feedback. I rerun linting + replaced the implicit sys.modules call to detect icu availability with a helper "have_icu" variable...

sumpfork commented 6 months ago

I just deployed this to the online generator (https://github.com/sumpfork/bgtools_web/pull/40) with instances of all the needed fonts and pyicu for sorting. Took a bunch of work, but I think it's mostly correct now.

sumpfork / dominiontabs

Czech translation #499

macOS

Ubuntu

Windows