Closed smrky1 closed 7 months ago
Thanks! I'll try to get to testing and reviewing this weekend.
As a note right away, it does look like you need to pre-commit install
and then pre-commit run --all-files
.
Also, look at the failing card db compile tests?
I have run pre-commit run --all-files
to lint and test & committed the updates. So far it seems that Compile Card DB / pytest -k "test_languages"
runs fine on my machine as well...
I have run
pre-commit run --all-files
to lint and test & committed the updates. So far it seems that Compile Card DB /pytest -k "test_languages"
runs fine on my machine as well...
Looks like the set_locale
call isn't working in tests: https://github.com/sumpfork/dominiontabs/actions/runs/7857961555/job/21442564732?pr=499
I have run
pre-commit run --all-files
to lint and test & committed the updates. So far it seems that Compile Card DB /pytest -k "test_languages"
runs fine on my machine as well...Looks like the
set_locale
call isn't working in tests: https://github.com/sumpfork/dominiontabs/actions/runs/7857961555/job/21442564732?pr=499
Yeah, this needs at least the full locale name (language_REGION) like de_DE
which we don't provide by default other than in en_US
but even there not capitalized properly. It's maybe a good idea to change all the locales to be such actual valid locale strings?
Seems it is a bit more complicated than that :/ All tests pass OK on my Windows machine (running Python 3.8.18, so same as the test job). The root cause seems to be related to Docker Ubuntu image for the GitHub Action job not having the necessary locales available / set. Together with using incorrect language codes (as you pointed out). Surprisingly the default language codes work on Windows as I wrote above...
I will see if I can figure out some workaround, if not I will revert the locale based sorting back to at least have the rest of the PR working....
Some interesting reading: https://stackoverflow.com/questions/14547631/python-locale-error-unsupported-locale-setting https://stackoverflow.com/questions/28405902/how-to-set-the-locale-inside-a-debian-ubuntu-docker-container https://bobbyhadz.com/blog/locale-error-unsupported-locale-setting-in-python#solving-the-error-in-docker
I think I managed to figure it out. Required 3 fixes:
1) Adding all necessary locale languages to the Workflows (I added them to compile_card_db.yml
and lint_and_test.yml
)
2) Also OS dependent locale.setlocale switch in main.py.
3) Additionally I had to rename "nl_du" to "nl_nl" to match a correct language code
All tests should pass now (tested the Workflows on my local machine using act
). Please take a look if it is acceptable. Maybe there is a better way, I have to admit this is my first time working with GitHub Workflows. If not, I suggest reverting back to the old non locale based sorting (which will however not sort the cards correctly in all languages) and only commit the Czech translations.
Added some error handling for when the locale cannot be set (mainly on Linux systems). In that case it will generate a warning and revert to default locale based warning. I think this together with my previous commit is finally a "safe" solution. I also added explanation to Installation section of README.md
.
The locale-gen
updates in compile_card_db.yml
and lint_and_test.yml
(from my previous commit) are strictly speaking no longer necessary to pass the tests. However I decided to keep them there to avoid the warnings.
This works for me locally now. I'm guessing you maybe need sudo
?
Right - as you say, for whatever reason it works without sudo
when using act locally, but not on GitHub servers... Added sudo
to generate necessary locales...
So I'd like to land this, but I'm getting
dominion_dividers --language=cs
** Warning: --tab-side with 'alternate' implies 2 tabs. Setting --tab-number to 2 **
fish: Job 1, 'dominion_dividers --language=cs' terminated by signal SIGSEGV (Address boundary error)
on my mac. Same in one of the units tests. Will try to figure out what's going on.
Sorry, I cannot help with that much - do not have Mac :-/ Everything works OK on my Windows 10 and Ubuntu 22.04...
This seems to have something to do with not specifying an encoding in the setlocale()
call. If I use locale.setlocale(locale.LC_COLLATE, (lang, "UTF-8"))
this works on my machine. Otherwise it runs fine through a bunch of comparisons and then suddenly crashes on the second time trying to get a comparison string for Doctor
in the cs
locale. Default seems to be ISO8859-2
for that - really not sure why that would lead to a crash, but can you confirm that this works for sorting for you and change it to UTF-8?
Thanks for the UTF-8 suggestion, after some reading, defaulting to UTF-8 seems to be a good idea. I implemented the changes & updated the documentation. On both Windows and Ubuntu everything works fine - but I cannot test on Mac...
To generate the correct locale on Ubuntu the following is the right command: sudo locale-gen xx_XX.UTF-8
. But I have no idea how to generate the required UTF-8 locale on macOS - can you suggest? Maybe it is worth adding that to the documentation? I have been trying to run macOS in VMware since yesterday with no luck...
But I have no idea how to generate the required UTF-8 locale on macOS - can you suggest?
I have finally managed to run macOS in VM. Seems all the necessary UTF-8 locales are installed by default, so no need to run anything similar to sudo locale-gen xx_XX.UTF-8
.
However I found a new problem - all languages work fine except for "es", which crashes with Fatal Python error: Segmentation fault
. Can you try on your setup? Testing in VM is a bit cumbersome ☹️
dominion_dividers --expansions dominion1stEdition -l es
crashes, while e.g. dominion_dividers --expansions seaside1stEdition -l es
does not 😕
After some debugging, I narrowed the crash down again to the locale.setlocale(locale.LC_COLLATE, (lang, "UTF-8"))
line. Works OK for all languages, but crashes with "es_ES.UTF-8" on macOS for whatever reason. Running the same line directly in Python shell works without any problems...
I think it might be related to the setlocale
function not being thread safe or something. It relies on an underlying OS call which obviously behaves differently on every OS.
Can you test on your macOS setup to check the problem is not related to my macOS running in VMware VM?
Sadly, confirmed to crash on a real Mac with es
language setting, too.
I haven't checked, but as I mentioned, before it was not crashing on that line, but rather at some point later during actual formation of comparison strings.
I haven't tested all the steps and settings you all have been back and forth on but this all seems like a good idea to containerize the application. That way people can run with Docker and get the same results no matter their OS/Version. I can take that up as a little side project if you all think that is a good idea.
I haven't tested all the steps and settings you all have been back and forth on but this all seems like a good idea to containerize the application people can run with Docker and get the same results no matter their OS/Version. I can take that up as a little side project if you all think that is a good idea.
Yeah, or add this to CI, haven't checked github actions but other CI systems I'm familiar with support testing on different platforms...
I think way forward is to finally give up on the OS dependent setlocale() call...
This article is how I started the cards sorting implementation. Apart from setlocale(), the best solution seems to be pyICU, but sounds like it is quite difficult to set up (and potentially OS dependent again).
IMHO the second best way to sort alphabetically is to use pyuca
library. It will not work 100% correct for languages like Czech or Polish (still better than the default sort), but is easily available and portable.
If you agree, I will drop the setlocale() and replace with pyuca
based sorting...
I think way forward is to finally give up on the OS dependent setlocale() call...
This article is how I started the cards sorting implementation. Apart from setlocale(), the best solution seems to be pyICU, but sounds like it is quite difficult to set up (and potentially OS dependent again).
IMHO the second best way to sort alphabetically is to use
pyuca
library. It will not work 100% correct for languages like Czech or Polish (still better than the default sort), but is easily available and portable.If you agree, I will drop the setlocale() and replace with
pyuca
based sorting...
Sounds great to me if you got the time!
Good news :) I went through the locale documentation more carefully and followed the provided example. Found out that setting the locale permanently probably affects some other functionality which eventually leads to crashes. The example shows a nice workaround: storing the current locale setting and restoring it back after you utilize the new, temporary locale setting for sorting. Not ideal, but after I restoring the original locale back, it works!
Currently the tests pass on all Windows 10, Ubuntu LTS 22.04 and macOS 14.0 😉
I did some more research and found out the locale based sort does not work correctly on macOS anyway (even for languages that do not crash). For example, Czech language is sorted incorrectly even if locale based sorting is used.
Seems to be a known issue on macOS (see here or here). The Czech collation table is symlinked to another collation table (Latin?), creating a language insensitive sort in the end: ls -al /usr/share/locale/cs_CZ/LC_COLLATE -> ../la_LN.US-ASCII/LC_COLLATE
.
After spending many hours trying to figure the correct language aware cards sorting out, the options as I see them now:
setlocale()
for Windows and Ubuntu, but disable this feature on macOS since it does not work correctly anyway (and causes random crashes). The advantage here is that on Win and Ubuntu the users would get correct sort order without any additional dependencies.pyuca
: easy to implement, works OK for some languages, but results in incorrect sort order in languages with many accents (Czech, Polish, ...). In particular since now we have Czech translation, the Dominion dividers are generated in incorrect order using pyuca
based sort. Will potentially cause similar issues for other languages if they are ever added to dominiontabs
...pyuca
+ add czech_sort
library specifically for Czech language sorting. Again, quite easy to implement and it immediately fixes the Czech language issues. On the other hand it is not future proof if other languages are added later...pyICU
dependency. This will produce the best sort results, but binaries need to be compiled and other dependencies downloaded on macOS and Windows, making the dominiontabs
setup much more complicated (on Ubuntu it is easy to get pyICU with apt-get install python3-icu
).@sumpfork please let me know your thoughts, I personally would vote for option 1 or 3...
Just for the reference, the potentially best / safest option 4 (pyICU
), would require the following setup on each respective platform.
These are the required extra steps I obviously want to avoid (they would have to be added to Documentation as prerequisites) 😕
# instal Homebrew
/bin/bash -c "$(curl -fsSL https://raw.githubusercontent.com/Homebrew/install/HEAD/install.sh)"
# add Homebrew to PATH
(echo; echo 'eval "$(/usr/local/bin/brew shellenv)"') >> ~/.zprofile
eval "$(/usr/local/bin/brew shellenv)"
# install libicu (keg-only)
brew install pkg-config icu4c
# let setup.py discover keg-only icu4c via pkg-config
export PATH="/usr/local/opt/icu4c/bin:/usr/local/opt/icu4c/sbin:$PATH"
export PKG_CONFIG_PATH="$PKG_CONFIG_PATH:/usr/local/opt/icu4c/lib/pkgconfig"
# EITHER - when using a gcc-built CPython (e.g. from Homebrew)
export CC="$(which gcc)" CXX="$(which g++)"
# OR - when using system CPython or another clang-based CPython, ensure system clang is used (for proper libstdc++ https://gitlab.pyicu.org/main/pyicu/issues/5#issuecomment-291631507):
unset CC CXX
# avoid wheels from previous runs or PyPI
pip install --no-binary=:pyicu: pyicu
sudo apt-get update
sudo apt-get install python3-icu
Download latest pyICU wheel file from https://github.com/cgohlke/pyicu-build/releases:
cp312
for Python 3.12, cp39
for Python 3.9, etc.Install the wheel on the command line, for example for Python 3.12 64-bit
py.exe -3.12 -m pip install PyICU-2.12-cp312-cp312-win_amd64.whl
Thanks for the detailed write up of those options @smrky1
Since we have a dockerfile now, IMO the best option would be 4. We can easily install python-icu
inside docker since it's Linux and then have an easy and consistent experience for Mac/Windows (and Linux) users by just directing them to use the containerized version of the CLI.
Thanks for the detailed write up of those options @smrky1
Since we have a dockerfile now, IMO the best option would be 4. We can easily install
python-icu
inside docker since it's Linux and then have an easy and consistent experience for Mac/Windows (and Linux) users by just directing them to use the containerized version of the CLI.
Remember we need to make sure that all of this also works for the online version running in AWS Lambda, which is the access point for most (almost all) users. It's probably not too hard to get that working, but may mean additional work on my side.
I'm leaning towards just making pyicu an optional dependency that we use if present, current behaviour if not.
To note, I'm also travelling for work the next week or so so may be slow to respond.
Remember we need to make sure that all of this also works for the online version running in AWS Lambda, which is the access point for most (almost all) users. It's probably not too hard to get that working, but may mean additional work on my side.
Makes sense: you can have Lambdas run arbitrary images, so some configuration changes needed but should work fine. Might be able to install the pyicu at runtime also but it's probably better to pre-build it.
I'm leaning towards just making pyicu an optional dependency that we use if present, current behaviour if not.
That seems reasonable, we can bake it in to the docker image for easy access if folks need it (and avoid all the install hassles that @smrky1 documented above). I'll make the dockerfile change in this branch.
To note, I'm also traveling for work the next week or so so may be slow to respond.
Safe travels!
Remember we need to make sure that all of this also works for the online version running in AWS Lambda, which is the access point for most (almost all) users. It's probably not too hard to get that working, but may mean additional work on my side.
Makes sense: you can have Lambdas run arbitrary images, so some configuration changes needed but should work fine. Might be able to install the pyicu at runtime also but it's probably better to pre-build it.
Well, you can't run arbitrary images, just ones based on supported lambda runtimes or close matches. Plus, the function I use is currently an automatically generated one via CDK's python lambda support, not a custom docker lambda. I can switch it, but again, a bit of work :).
So yeah, I'd say
1) support pyicu if we find it's installed (probably just a from icu import Locale
in a try/except with a warning that it isn't/can't be imported if that fails)
2) support pyicu locally via docker
3) I'll update the web version to have pyicu installed when I find the time
OK, agreed! I will take care of 1) in the next few days.
@nickv2002 While you're at it... I checked you dockerfile (being completely new to Docker), and it is not clear to me what line 4 does:
COPY --from=pacodrokad/fonts /fonts /fonts
Is pacodrokad/fonts
correct? Seems to me like a path specific to your setup?
Well, you can't run arbitrary images, just ones based on supported lambda runtimes or close matches. Plus, the function I use is currently an automatically generated one via CDK's python lambda support, not a custom docker lambda. I can switch it, but again, a bit of work :).
Yeah lots of complexity details there, I was being overly general and you know the implementation details much better. (Just for ref, here the feature I was thinking of and a python example.)
So yeah, I'd say
- support pyicu if we find it's installed (probably just a
from icu import Locale
in a try/except with a warning that it isn't/can't be imported if that fails)- support pyicu locally via docker
- I'll update the web version to have pyicu installed when I find the time
Sounds good to me.
@nickv2002 While you're at it... I checked you dockerfile (being completely new to Docker), and it is not clear to me what line 4 does:
COPY --from=pacodrokad/fonts /fonts /fonts
Is
pacodrokad/fonts
correct? Seems to me like a path specific to your setup?
That's a way to get the fonts into the image by copying them from somewhere else (in this case another image) rather than hosting fonts in this repo.
That's a way to get the fonts into the image by copying them from somewhere else
Might be worth adding to the documentation information on how to set it up correctly then.
When I run docker build . -t dominiontabs
according to the instructions, it fails:
=> [internal] load build definition from dockerfile 0.0s
=> => transferring dockerfile: 806B 0.0s
=> ERROR [internal] load metadata for docker.io/pacodrokad/fonts:latest 1.3s
=> [internal] load metadata for docker.io/library/python:3.9-slim 1.2s
------
> [internal] load metadata for docker.io/pacodrokad/fonts:latest:
------
dockerfile:4
--------------------
2 |
3 | # get fonts
4 | >>> COPY --from=pacodrokad/fonts /fonts /fonts
5 |
6 | # Add git for hooks
--------------------
ERROR: failed to solve: pacodrokad/fonts: no match for platform in manifest: not found
@smrky1 thanks for that note, it was an arch mismatch (that line is just pulling files, but it still looks for the image of the same arch, I fixed by telling it the specific arch to use). Fixed and merged over here: https://github.com/sumpfork/dominiontabs/pull/503 along with adding python3-icu
- Fix Arch Mismatch identified here
- Adding
python3-icu
as discussed here
PyICU
based sorting + removed the locale
related stuff. If PyICU is not found, the script will revert to the default method (using strip_accents()
).PyICU
)PyICU
integration in CI, the workflows will use PyICU
on Ubuntu runner (but not on Win or macOS)--expansion_dividers
switch to fail with error, this is now fixedFYI, I completed the translation of all expansions available in Czech language translation). In the process I found out that the original English set has wrong Peddler description: "+2 Coin" -> should be "+1 Coin". Will rename the pull request title to reflect this.
Just getting back to this, will review before I go on my next trip next week.
Only today I found some time to review your feedback. I rerun linting + replaced the implicit sys.modules call to detect icu availability with a helper "have_icu" variable...
I just deployed this to the online generator (https://github.com/sumpfork/bgtools_web/pull/40) with instances of all the needed fonts and pyicu for sorting. Took a bunch of work, but I think it's mostly correct now.
Added Czech translation for the Base game, Intrigue, Dark Ages, Seaside. Required a couple more changes: