Vroom should be LANG-insensitive

malcolmr commented 10 years ago

While attempting to reproduce #13, I noticed that the message output I'm getting on failure is different to what was quoted. Instead of e.g.

FAILED on line 4: Expected message not received:
"Bar" (verbatim mode)

Message was "Foo"

Failed command on line 4:
[...]

I'm getting

FAILED on line 4: Expected message not received:
"Bar" (verbatim mode)

Messages:

Messages maintainer: Mike Williams <mrw@eandem.co.uk>
Foo

Failed command on line 4:

It turns out this is because I'm running with LANG=en_GB.UTF-8, which changes at least the maintainer (which is hardcoded in vroom/messages.py). Running as LANG=en_US vroom ... DTRT.

Should vroom be setting LANG=en_US (or LANG=C?) before running vim? It's not entirely clear to me whether hardcoding this would be unnecessarily restrictive, or whether it affects anything else.

Alternatively, we could conceivably work out what the "Messages maintainer" message looked like ourselves, perhaps by running vim and dumping the messages with no input (or could we just do that as the first thing unconditionally?).

dbarnett commented 10 years ago

Err… Pretty sure we shouldn't be en_US-centric, or LANG-dependent at all if we don't have to be, even if in practice most vroom files will be LANG-dependent. We'll have to see if it's such a lost cause that we should resort to LANG=C everywhere.

This specific issue should be a simple matter of changing it to a regex (assuming it's always Messages maintainer: .*, or something else regex-able). Also, I think we check messages before and after each vroom line… maybe we should just ignore all messages from startup.

malcolmr commented 10 years ago

It's not that simple, sadly.

$ TEXTDOMAINDIR=/usr/share/vim/vim73/lang/ LANGUAGE=de gettext -s -d vim 'Messages maintainer: Bram Moolenaar <Bram@vim.org>'
Übersetzt von Georg Dahn <gorgyd@yahoo.co.uk>

In the specific case of the maintainer messages, I agree with you: we probably should be checking messages at startup (which might help in fixing #13 too).

There are at least two other cases where this shows up, though:

We also have an explicit check for 'E449: Invalid expression received: Send expression failed.' in vroom/vim.py, which in German ends up as 'E449: Ungültiger Ausdruck: Versenden des Ausdrucks fehlgeschlagen.' In that case we probably should just be checking the prefix if we're not forcing LANG.
There's something broken in non-ASCII message handling too: if I run LANG=de_DE.UTF-8 vroom examples/directives.vroom --message-strictness STRICT, I get a stack trace containing UnicodeEncodeError, while we're building the error message.

I'm happy to morph this into "Vroom shouldn't be sensitive to LANG" if you think that's the right thing to do.

dbarnett commented 10 years ago

I was afraid of that.

Still, I think we should try to knock out any specific cases of LANG-sensitivity we can find, even if we end up stuck on some major issue that forces us into LANG-sensitivity. (Folks will be keep having to troubleshoot why vim has different behavior from vroom otherwise.) I'd say go ahead and morph this issue accordingly, and if we get stuck we'll take it in stride.

As far as the "E449: Invalid expression received" message, we have an explicit guideline to avoid messages and only match on error codes for that purpose. That's my fault for hard-coding the E449 message.

malcolmr commented 10 years ago

Another instance: when we check for a valid DISPLAY, we test the text returned to see if it's exactly No display: Send expression failed. (note: no prefix). This leads to failures like:

08   > 10aHello<CR><ESC>dd
------------------------------------------------
ERROR: Vim quit unexpectedly, saying "Keine Anzeige: Versenden des Ausdrucks fehlgeschlagen."

I think that here (in TryToSay() in vim.py) it would be reasonable to set LANG=C, since we're only controlling the client-side message, not the server side. It does mean that we'd get a mixture of languages on failure, but I think that's probably okay; for example, an expression failure might result in E449: Ung�ltiger Ausdruck: Send expression failed. (also note the replacement character).

malcolmr commented 10 years ago

Hmm, actually, LANG=C means that we can't parse the resulting message as UTF-8, which causes more trouble later. Unfortunately, while LANG=C inhibits localisation, there's not way to specify "no localisation but UTF-8 output".

I think the best thing to do for the above (DISPLAY) case is to set (on the client side only) LANGUAGE=en_US.UTF-8 (and probably LC_ALL, for non-GNU gettext), which will do the right thing.

malcolmr commented 10 years ago

I've broken the "can't handle non-ASCII output" case into #28, leaving this to cover:

matching the startup messages more generally (the messages-maintainer message).
fixing the example tests to pass with a non-en LANG.

google / vroom

Vroom should be LANG-insensitive #25