gwtproject / gwt

GWT Open Source Project
http://www.gwtproject.org
1.51k stars 373 forks source link

Inconsistent handling of quotes in Messages and Constants #6646

Open dankurka opened 9 years ago

dankurka commented 9 years ago

Originally reported on Google Code with ID 6647

Found in GWT Release (e.g. 1.5.3, 1.6 RC):
GWT  2.3.0

Encountered on OS / Browser (e.g. WinXP, IE6-7, FF3):
Linux (Ubuntu 11.04)

Detailed description (please be as specific as possible):
Given: An i18n properties file 'LocalizableResource_en.properties' with a key named
"foo" (without the enclosing double quotes) and a value of "what''s new":

foo = what''s new

When loading this property via the Messages interface or <ui:msg key='foo'> tags in
a uibinder template the following value will be generated in the output: "what's up"
(without the enclosing double quotes)
Which is the correct behavior according to the specification of the MessageFormat class
at http:// 
download.oracle.com/javase/6/docs/api/java/text/MessageFormat.html

However when loading the exact same property using a Constants interface (com.google.gwt.i18n.client.Constants)
the output yields the following value: "what''s up" (without the enclosing double quotes).

Now the output is not correct anymore: The 2 consecutive single quotes are NOT replaced
with a single, single quote.

This not unified handling of quotes is very cumbersome when developing localized applications.
As one needs to know when writing the properties files how each key is used in the
application. This is not feasible - not for developers and even less for translators.

Preferably The handling via the Constants interface should be adapted to match the
handling used for messages.

Workaround if you have one:
Write a wrapper Class for the Constants interface and do the replacement for every
key yourself (meaning twice the work and 2 classes to maintain). Described here: http://stackoverflow.com/questions/6537716/how-to-handle-single-quotes-in-internationalization-constants

Links to relevant GWT Developer Forum posts:
http://groups.google.com/group/google-web-toolkit/browse_thread/thread/30b7927dd9e7b1b2#

Reported by googelybear on 2011-08-02 22:17:42

dankurka commented 9 years ago
I'm also having this issue using IE9 on a Win7 system.  It is making localization into
French (which uses apostrophe a lot) almost impossible.  Is this going to be fixed?
 If so, will it be fixed any time soon?

Reported by pat_grever@hp.com on 2012-03-16 16:14:27

dankurka commented 9 years ago
Please take care of this issue. 
Without the possibility to write the translations in a "readable" format it will be
very difficult to handle with french language (or any other language with "'"s...).
I can't tell my translators do double the "'"s.

Reported by dr.thomaslang on 2012-04-10 08:53:27

dankurka commented 9 years ago
@googleybear: they are different because Constants doesn't have the need to represent
arguments in the message, so no quoting is required.  Aside from Map<String,String>
support, what would Constants add over Messages for strings if this were changed. 
More importantly, it would break every existing user of Constants.

From the referenced StackOverflow post, it sounds like you are trying to use the same
key/value pair in both a Constants interface and a Messages interface (via UiBinder)
-- why not just use Messages for both?  Otherwise, this sounds more like a request
for adding Constants support to UiBinder.

@dr.thomaslang: the problem is you have characters which have special meaning in a
MessageFormat-style message, and if you don't have quoting how can you tell which is
which?  For example:

'{0}' is how you specify an argument

vs

''{0}'' is a quoted argument

If you automatically doubled every quote in the translations, there would be no way
to include braces in the translated text, for example.

I would expect this is handled in whatever translation system you are using, for example
the second one might be shown as single quotes around whatever represents a replaceable/non-translatable
argument (typically using the name or example of that argument), and then when you
get the translation back you convert it appropriately.

The only viable alternative would be to treat ' as literal if the next character wasn't
a brace or you were already in a quoted section, which is what ICU did in version 4.8
(see http://icu-project.org/apiref/icu4j/com/ibm/icu/text/MessagePattern.ApostropheMode.html
).  However, changing it at this point would break some translations, which seems bad.

Reported by jat@google.com on 2012-04-22 03:44:28

dankurka commented 9 years ago
Actually, I thought of a way where we could do this without breaking backwards compatibility
-- basically, an annotation that could be applied to the interface or a method specifying
the quoting style -- REQUIRED_QUOTES, OPTIONAL_QUOTES, NO_QUOTES.

Constants would default to NO_QUOTES and Messages would default to REQUIRED_QUOTES,
but you could override either.

Reported by jat@google.com on 2012-04-23 15:33:18