tobyink / p5-ask

3 stars 4 forks source link

Better i18n #3

Open tobyink opened 3 years ago

tobyink commented 3 years ago

Migrated from rt.cpan.org #82925 (status was 'open')

Requestors:

Attachments:

From daxim@cpan.org on 2013-01-24 07:56:05 :

Instead of hardcoding your own "no/cancel" strings which only work for English, use the I18N::Langinfo core module.

$ for l in de_AT.UTF-8 he_IL.UTF-8 ja_JP.UTF-8 ru_RU.UTF-8 zhTW.UTF-8 ; do LANG=$l perl -MI18N::Langinfo=langinfo,YESEXPR,NOEXPR - MData::Dumper -E'say Dumper [map { langinfo $ } YESEXPR, NOEXPR]' ; done $VAR1 = [ '^[jJyY].', '^[nN].' ];

$VAR1 = [ '^[Yyכ].', '^[Nnל].' ];

$VAR1 = [ '^([yYyY]|はい|ハイ)', '^([nNnN]|いいえ|イイエ)' ];

$VAR1 = [ '^[ДдYy].', '^[НнNn].' ];

$VAR1 = [ '^[yY是]', '^[nN不否]' ];

tobyink commented 3 years ago

From perl@toby.ink on 2013-01-24 15:21:00 :

On 2013-01-24T07:56:05Z, DAXIM wrote:

Instead of hardcoding your own "no/cancel" strings which only work for English, use the I18N::Langinfo core module.

I agree in principle, but have some concerns.

However, if the question is framed in English, it might be surprising to be expected to answer in another language, even if that language is your preferred language.

In something like German, that might not be a problem; answers beginning with "J" or "Y" can be interpreted as affirmative answers, and "N" as negative.

But with Polish, the word "no" is actually an answer in the affirmative!!

Something certainly needs to be done to improve Ask's i18n, but I'm not sure this is the solution.

It is actually possible in current versions, but not documented to pass options to "question" to match answers:

my $response = question( text => "Haben sie ein Schloss?", ok => qr{^Ja\b}i, cancel => qr{^(Nein\b|Kein\sSchloss)}i, );

The "ok"/"cancel" options can be regular expressions, arrayrefs, coderefs, or anything else that can appear on the right hand side of a smart match.

tobyink commented 3 years ago

From perl@toby.ink on 2013-01-25 13:54:56 :

Perhaps...

my $response = question( text => "Haben sie ein Schloss?", lang => "de", );

For the OO-interface you could supply "lang" as an argument to the constructor to prevent having to repeat it every time you ask a question.

tobyink commented 3 years ago

From perl@toby.ink on 2013-03-05 23:56:37 :

I've just released Lingua::Boolean::Tiny which will soon become an Ask dependency to solve this issue.

tobyink commented 3 years ago

From daxim@cpan.org on 2013-03-06 10:57:02 :

But with Polish, the word "no" is actually an answer in the affirmative!!

Everyone who has ever used a computer interface knows that "t" for tak and "n" for nie take precedence.

Lingua::Boolean::Tiny […] to solve this issue. I don't think so. If you don't like the restriction of locales, then why didn't you just dump and reuse the langinfo data? I'm curious where you got the L::B::T translations from as the quality is awfully poor compared with I18N::Langinfo. I found the following problems within the first 10 minutes:

  1. It treats an invalid input (typoes!) always as "no", precluding any validation.
  2. It does not accept "j", only "ja", but typically that sort of interface only needs the first letter/character, and also prompts that way.
  3. It requires interpunction ("。") for Chinese, but it's unheard of to reject answers that do not include interpunction, such as "是".
  4. It does not register "否", but that's the standard answer which is used everywhere.
  5. It accepts "shi", but not "dui".
  6. There are only 12 languages, but

    locale -a|perl -lne'next if /^C|POSIX/; s/_.*//; print'|uniq -c|wc - l

counts 164 languages, and of the 449 total locales 442 work with Perl's langinfo interface out of the box.

tobyink commented 3 years ago

From perl@toby.ink on 2013-03-06 21:17:01 :

On 2013-03-06T10:57:02Z, DAXIM wrote:

I don't think so. If you don't like the restriction of locales, then why didn't you just dump and reuse the langinfo data?

I would if I could find it.

I'm curious where you got the L::B::T translations from as the quality is awfully poor compared with I18N::Langinfo.

Wiktionary mostly.

I found the following problems within the first 10 minutes:

  1. It treats an invalid input (typoes!) always as "no", precluding any validation.

"No" returns 0. Invalid inputs return undef. This is documented.

If you need to distinguish between these cases, just use Perl's "defined" function. However, interpreting unknown input as "no" is fairly standard procedure. Compare with "rm -i".

  1. It does not accept "j", only "ja", but typically that sort of interface only needs the first letter/character, and also prompts that way.

OK, I shall change that.

  1. It requires interpunction ("。") for Chinese, but it's unheard of to reject answers that do not include interpunction, such as "是".
  2. It does not register "否", but that's the standard answer which is used everywhere.
  3. It accepts "shi", but not "dui".

I don't speak most of the languages supported by Lingua::Boolean::Tiny, so there's a bit of guesswork in there. Happy to make these changes.

  1. There are only 12 languages, but

    locale -a|perl -lne'next if /^C|POSIX/; s/_.*//; print'|uniq -c| wc - l

counts 164 languages, and of the 449 total locales 442 work with Perl's langinfo interface out of the box.

On my system that lists two languages (English and Chinese).

I did experiment with I18N::Langinfo as suggested, however I had difficulty selecting a locale at run-time, and found that for a locale to work properly, extra files needed to be installed, which I couldn't rely on.

tobyink commented 3 years ago

From daxim@cpan.org on 2013-03-07 09:30:16 :

I would if I could find it.

truncate -s 0 yesno ; for l in `locale -a` ; do
    LANG=$l perl -MI18N::Langinfo=langinfo,YESEXPR,NOEXPR,CODESET -

MData::Dumper -MEncode=decode,encode -E' say "✁"x20; say $ENV{LANG}; say for map { encode "UTF-8", decode langinfo(CODESET), langinfo $_ } YESEXPR, NOEXPR; say "✃"x20; ' >> yesno 2>&1 ; done

Output is attached.

Wiktionary mostly.

Mine proper computing dictionaries instead: http://www.microsoft.com/Language/en-US/Terminology.aspx http://i18n.kde.org/dictionary/search-translations.php

On my system that lists two languages You need to repair your locales then in order to redump them yourself. The attached list should be good for some time, though.

tobyink commented 3 years ago

From perl@toby.ink on 2013-03-08 06:51:36 :

On 2013-03-07T09:30:16Z, DAXIM wrote:

Output is attached.

Thanks, this will be helpful.

Could you please also add YESSTR and NOSTR, as these would be useful for displaying questions like:

print "$question [$yesstr/$nostr]\n";

tobyink commented 3 years ago

From daxim@cpan.org on 2013-03-08 14:03:50 :

$ rpm -qi glibc-locale | ack Version Version : 2.16

truncate -s 0 yesno ; for l in $(locale -a) ; do LANG=$l perl \ -MI18N::Langinfo=langinfo,CODESET,YESEXPR,NOEXPR,YESSTR,NOSTR \ -MEncode=encode,decode,resolve_alias -MIPC::Run3=run3 -Mutf8 -E' sub decode { my ($encoding, $val) = @; if (resolve_alias($encoding)) { return decode $encoding, $val; } else { my $out; run3 [qw(iconv -f), $encoding, qw(-t UTF-8 -)], \$val, \$out; return decode "UTF-8", $out; } } my @val = map { decode(langinfo(CODESET), langinfo($)) } YESEXPR, NOEXPR, YESSTR, NOSTR; say encode "UTF-8", sprintf <<"", $ENV{LANG}, @val; ✁✁✁✁✁✁✁✁✁✁✁✁✁✁✁✁✁✁✁✁ LANG=%s YESEXPR=%s NOEXPR=%s YESSTR=%s NOSTR=%s ✃✃✃✃✃✃✃✃✃✃✃✃✃✃✃✃✃✃✃✃

    ' >> yesno 2>&1 ;

done

Alternatively, clone git://sourceware.org/git/glibc.git and parse localedata/locales/* to extract the LC_MESSAGES blocks. I'm going to file a bug with glibc bugzilla to bring attention to the spotty coverage.

tobyink commented 3 years ago

From perl@toby.ink on 2013-03-08 22:37:45 :

OK, I've translated all that to Perl

https://github.com/tobyink/p5-lingua-boolean-tiny/commit/ fb1e0ac4aa52285a709700e5e5dc1c1c04d80c1d

(Beware: that takes frickin ages to load in my browser!)

This should appear in Lingua::Boolean::Tiny 0.002 in the next few days.

tobyink commented 3 years ago

From perl@toby.ink on 2013-06-18 15:27:11 :

Ask-0.007 uses Lingua::Boolean::Tiny to interpret answers.

However, there's still a long way to go i18n-wise. There are tonnes of hard-coded English strings.

So I'll keep this bug open as a general place for tracking i18n.