phyver / GameShell

a game to learn (or teach) how to use standard commands in a Unix shell
GNU General Public License v3.0
2.17k stars 138 forks source link

The check for locales seems suboptimal #51

Closed rlepigre closed 3 years ago

rlepigre commented 3 years ago

Running make tests-fr on my system fails here: https://github.com/phyver/GameShell/blob/882ac4c0b0310dd66ce8d85f0f0e395dc757d556/start.sh#L73

I think we should rather test that there is a locale starting with whatever the option argument is. In my case, locale -a gives:

C
en_US.utf8
fr_FR.utf8
POSIX

so I do not have just fr, which is used as a value by make tests-fr.

phyver commented 3 years ago

Strange. Doesn't your system support the LANGUAGE variable? (https://www.gnu.org/savannah-checkouts/gnu/gettext/manual/html_node/The-LANGUAGE-variable.html)

fr is not a valid locale name, but can be used in the LANGUAGE variable, which I thing is only available on GNU systems.

Can you try running

unset LANGUAGE
locale

to see if LANGUAGE appears in locale's output?

And

LC_ALL=en_US.utf8
LANGUAGE=fr ./start.sh

to check LANGUAGE is used on your system?

Currently, I look at the output of locale to see if the special variable LANGUAGE appears. (On my system, it appears even when unset, so I though it might be a good way to test if it was supported.) If it appears, I simply set LANGUAGE to the given language. That works on Debian and Ubuntu.

If LANGUAGE doesn't appear, I check if the user gave an actual locale by looking if it appears in the output of locale -a. That's not very robust, as the encoding part has multiple variants. (UTF-8, utf8, or even fr_FR.UtF-..-.-...---8!)

If the user didn't provide a valid locale, I assume she gave the ISO 639 language code and thus the beginning of a "real" locale. I simply list the corresponding locales installed on the system for information.

Something slightly more robust would be to match the provided argument against /[a-z][a-z]_[A-Z][A-Z].*/ to decide if we were given a locale name or a language.

Trick question: what do you think: should we set the LC_MESSAGES variable, or the LC_ALL variable?

rlepigre commented 3 years ago

Strange. Doesn't your system support the LANGUAGE variable? (https://www.gnu.org/savannah-checkouts/gnu/gettext/manual/html_node/The-LANGUAGE-variable.html)

It does support it, but it is not defined by default.

Can you try running

unset LANGUAGE
locale

This gives:

LANG=en_US.UTF-8
LC_CTYPE="en_US.UTF-8"
LC_NUMERIC="en_US.UTF-8"
LC_TIME="en_US.UTF-8"
LC_COLLATE="en_US.UTF-8"
LC_MONETARY="en_US.UTF-8"
LC_MESSAGES="en_US.UTF-8"
LC_PAPER="en_US.UTF-8"
LC_NAME="en_US.UTF-8"
LC_ADDRESS="en_US.UTF-8"
LC_TELEPHONE="en_US.UTF-8"
LC_MEASUREMENT="en_US.UTF-8"
LC_IDENTIFICATION="en_US.UTF-8"
LC_ALL=

(I realise that this may not be the best configuration for me!)

to see if LANGUAGE appears in locale's output?

As you see, it does not appear.

And

LC_ALL=en_US.utf8
LANGUAGE=fr ./start.sh

to check LANGUAGE is used on your system?

I could always use LANGUAGE=fr ./start.sh to run GameShell, and this still works as expected.

Currently, I look at the output of locale to see if the special variable LANGUAGE appears. (On my system, it appears even when unset, so I though it might be a good way to test if it was supported.) If it appears, I simply set LANGUAGE to the given language. That works on Debian and Ubuntu.

OK, apparently not on Archlinux. I could probably configure things so that it does appear, but relying on it in the configuration might not be very robust.

Something slightly more robust would be to match the provided argument against /[a-z][a-z]_[A-Z][A-Z].*/ to decide if we were given a locale name or a language.

That sounds like a better idea. :)

  • For a locale name, we set the LC_MESSAGES variable
  • For a language name, we set the LANGUAGE variable (if it appears in the output of locale), or print an error message.

I would even set it all cases, why only if it appears in the output of locale? I guess if you use GNU gettext then this variable should be inspected, even on non-GNU systems, don't you think?

Trick question: what do you think: should we set the LC_MESSAGES variable, or the LC_ALL variable?

I have no idea, I don't really understand the subtleties of locales. However, I read somewhere that LC_ALL should only be used for "debugging". But I'm not sure how reliable was that information.

phyver commented 3 years ago

And if you do

export LANGUAGE=fr
locale

does it appear?

Here is what I have at the moment: 1- If the user provides something that "looks" like a locale ([a-z][a-v]_[A-Z][A-Z]*), I set LANGUAGE and LC_MESSAGES. 2- If the user provides something that looks like a language code ([a-z][a-v]), I set LANGUAGE. 3- Otherwise, I print an error message.

We won't be able to write ./gameshell.sh -L french, but I can live with that.

It would be nice to be able to check that the LANGUAGE variable has been taken into account in 2-, hence my question.

Trick question: what do you think: should we set the LC_MESSAGES variable, or the LC_ALL variable?

I have no idea, I don't really understand the subtleties of locales. However, I read somewhere that LC_ALL should only be used for "debugging". But I'm not sure how reliable was that information.

I've also read that somewhere, but the alternative is setting all the LC_ variable one by one. The question could impact mission 01_cal_nostradamus that displays a date. (Variable LC_TIME.) At the moment, I display the date by hand: dd-mm-yyyy for French, and mm-dd-yyyy for English.

rlepigre commented 3 years ago

And if you do

export LANGUAGE=fr
locale

does it appear?

No it does not appear, I get the same output. According to this, I could define LANGUAGE globally be defining it in my /etc/locale.conf, but it is certainly not required.

Here is what I have at the moment: 1- If the user provides something that "looks" like a locale ([a-z][a-v]_[A-Z][A-Z]*), I set LANGUAGE and LC_MESSAGES. 2- If the user provides something that looks like a language code ([a-z][a-v]), I set LANGUAGE. 3- Otherwise, I print an error message.

We won't be able to write ./gameshell.sh -L french, but I can live with that.

Sure, I'm fine with that as well. However, do we really expect users to rely on the -L option? I would think that the system would pick the most appropriate available language given the local locales.

Trick question: what do you think: should we set the LC_MESSAGES variable, or the LC_ALL variable?

I have no idea, I don't really understand the subtleties of locales. However, I read somewhere that LC_ALL should only be used for "debugging". But I'm not sure how reliable was that information.

I've also read that somewhere, but the alternative is setting all the LC_ variable one by one. The question could impact mission 01_cal_nostradamus that displays a date. (Variable LC_TIME.) At the moment, I display the date by hand: dd-mm-yyyy for French, and mm-dd-yyyy for English.

OK, we should probably not do that, and respect LC_TIME. As I said, I think we should really rely on the locales of the system, not try to hack something ourselves. If the user wants to change the language, they can always change the locales using whatever environment variables are appropriate, don't you think?

phyver commented 3 years ago

In most cases, using the default locale should work, but there might be cases where it won't. I certainly hope the default locale is fr_FR on the university's computer, but would bet too much on it. Providing a sane way to set (part of) the locale without needing to talk about LC_MESSAGES, LANG or LANGUAGE is nice.

I just did some testing and discovered that LANGUAGE=fr doesn't work if the LC_MESSAGES is unset. Maybe we shouldn't bother with LANGUAGE and language codes, and have -L only set LC_MESSAGES (and LANG) to a locale name. That contradicts simplicity, as the advantage of LANGUAGE is that it doesn't require a valid locale, but I'm still unsure how everything fits together.

rlepigre commented 3 years ago

On thing to keep in mind though, is that LANGUAGE is actually quite essential since it can be used to give alternatives. For example you can run the game with LANGUAGE=fr:en_UK:en_US:en to use the French version if available, and then fallback to different version of English by order of preference.

phyver commented 3 years ago

That's right. And, we can don't need to use an existing locale. I mean, if your locale is not empty (POSIX), you can use LANGUAGE=fr even if you don't have any fr_?? locale.

So:

1- -L [a-z][a-z]_[A-Z][A-Z]* : set the LC_MESSAGES and LANG variables (LANGUAGE is irrelevant in that case), 2- otherwise, set LANGUAGE and hope for the best, 3- add something in the -h message, and in the doc for expert users.

I'm not sure I like it, but we could define a variable GSH_LANG=$(gettext "en"). That way, we can test if $(gettext "en") is different from en to check that "some" translation is actually being done.

Concerning LC_TIME and the rest, I think I prefer displaying the date "by hand". I am not sure some locale setting won't print the date as Fri 18 Jun 2021, which would make 01_cal_nostradamus rather uninteresting! (To avoid problems, the "day" part is always chosen in the range {13..28} so it cannot be confused with the month.)

phyver commented 3 years ago

Or even simpler, only use -L to set LANGUAGE, and explain in the help message that on non GNU system (including macOS), people have to do

LC_MESSAGES=fr_FR.utf8 ./gameshell.sh
rlepigre commented 3 years ago

Yeah, I think the simpler the better here. :)

phyver commented 3 years ago

Perfect! 1b1cd569

phyver commented 3 years ago

@rlepigre BTW, I'll probably try posting an update about GameShell on linuxfr next week. We can aim for sending it to other places the week after that. What do you think?

rlepigre commented 3 years ago

@rlepigre BTW, I'll probably try posting an update about GameShell on linuxfr next week. We can aim for sending it to other places the week after that. What do you think?

Sure, sounds good!