projectfluent / fluent-rs

Rust implementation of Project Fluent
https://projectfluent.org
Apache License 2.0
1.05k stars 95 forks source link

FluentBundle.format_pattern adds invisible characters around arguments #172

Closed JohnnyJayJay closed 4 years ago

JohnnyJayJay commented 4 years ago

I encountered this issue while writing tests for my Java bindings. I noticed that the return value of format_pattern adds additional characters before and after each argument. At first I thought this may be caused by the conversions between Java and Rust strings, but upon further investigation, this also happens in the simplest and purest possible Rust scenario.

This is the entire code:

use fluent::concurrent::FluentBundle;
use fluent::{FluentResource, FluentValue, FluentArgs};
use unic_langid::LanguageIdentifier;

fn main() {
    let resource = FluentResource::try_new("key = { $one } and { $two }!".to_owned()).unwrap();
    let id: LanguageIdentifier = "en-US".parse().unwrap();
    let mut bundle = FluentBundle::new(&[id]);
    bundle.add_resource(resource).unwrap();
    let msg = bundle.get_message("key").unwrap();
    let mut errors = vec![];
    let mut args = FluentArgs::new();
    args.insert("one", FluentValue::from("test"));
    args.insert("two", FluentValue::from(3.141));
    let value = bundle.format_pattern(&msg.value.unwrap(), Some(&args), &mut errors);
    println!("{}", value);
    println!("{}", value.to_owned() == "test and 3.141!".to_owned());
}

Expected output:

test and 3.141!
true

Actual output:

⁨test⁩ and ⁨3.141⁩!
false

The values look the same, but there are some zero-width characters before and after "test" and 3.141 respectively. You can see for yourself by pasting them into some string comparison tool.

The characters change the output on devices that are able to display them, for example here in a bash terminal: Terminal output

For this pure test, I used fluent version 0.11.0, but it also happens directly with fluent-bundle. I'm running on Windows 10 64 Bit.

zbraniecki commented 4 years ago

What you encountered is a method of handling bidirectionality documented here: https://github.com/projectfluent/fluent/wiki/BiDi-in-Fluent

The reason you encountered is something we argue against - tests should not test against any particular full output of a formatter (so, output.contains("John") is ok, but assert_eq!(output, "Hello John") is not) - https://firefox-source-docs.mozilla.org/l10n/fluent/tutorial.html#testing

If you really need to turn it off for testing, there's - https://github.com/projectfluent/fluent-rs/blob/master/fluent-bundle/src/bundle.rs#L364

JohnnyJayJay commented 4 years ago

Okay, that makes perfect sense. I wasn't aware of this behaviour. Implemented a binding for it straight away and it works. Thanks a bunch!

fosskers commented 3 years ago

Can this be reopened? It seems to be the default behaviour for any use of fluent, including through upstream tools like i18n-embed. Over the past few days I've been investing this, and my findings are summarized here: https://github.com/kellpossible/cargo-i18n/issues/44

In essence, the presence of the FSI/PDI characters is detected by the terminal and appears like this: 2020-11-20-083316_956x511_scrot This occurs regardless of which shell or terminal program I test it in.

The link given in @zbraniecki 's last post points to a code line on the master branch which has almost certainly changed since April. What was it supposed to point to?

Thanks for any help you can offer, fluent is very exciting to me and I plan to use it widely.

zbraniecki commented 3 years ago

Can this be reopened?

What's the purpose of reopening it?

It seems to be the default behaviour for any use of fluent

Correct. And it it the intended default behavior with accordance to internationalization best practices https://github.com/projectfluent/fluent/wiki/BiDi-in-Fluent

The issue you are reporting is of:

What would you like to add to fluent-rs?

fosskers commented 3 years ago

Thanks for getting back to me so quickly. My motivation for reopening was that to the layman who isn't familiar with the low-level intricacies of localization, the output appeared to be a bug. I'll accept the answer that it isn't, but perhaps the visibility of this issue/fix could be advertised somewhere, like a README?

For now, where was the link you posted to @JohnnyJayJay about the bundle tweak supposed to point to? I'll try to test what he tested, and then make a suggestion upstream to i18n-embed.

fosskers commented 3 years ago

Ah ha, it must have been pub fn set_use_isolating(&mut self, value: bool). Testing now.

EDIT: This indeed fixes the issue:

    bundle.set_use_isolating(false);

I understand the advertised concern and use case, namely:

This is important for cases such as when a right-to-left user name is presented in the left-to-right message.

In the case of my application, I'm in complete control of all strings and localization targets, so there's no need for the bidirectionality. It should be safe to turn it off completely then, yeah?

zbraniecki commented 3 years ago

My motivation for reopening was that to the layman who isn't familiar with the low-level intricacies of localization, the output appeared to be a bug.

It is. In the terminal that doesn't support FSI/PDI :) Or in the fact that you're trying to analyze output of an Intl API which is a no-go (output of internationalization is to be treated as opaque).

Ah ha, it must have been pub fn set_use_isolating(&mut self, value: bool).

https://docs.rs/fluent/0.13.1/fluent/bundle/struct.FluentBundleBase.html#method.set_use_isolating

In the case of my application, I'm in complete control of all strings and localization targets, so there's no need for the bidirectionality. It should be safe to turn it off completely then, yeah?

Maybe. Here are some of the scenarios where it is not:

I hope you can start drawing your own ideas on where it may come to play.

As for docs - I'm happy to see an issue opened on adding better docs, improving API docs, and adding README paragraph on BiDi. I'm aware that a lot of users do not realize that challenge when they start including localization systems and also that Fluent is the first to correctly use FSI/PDI to allow bidi text flow.

fosskers commented 3 years ago

Thank you for this, it's a great help. I'm sure others will wander in here from the future and be put at ease by the explanation.

Out of curiosity, what terminal do you use? I was noticing the issue in both xterm and urxvt, both popular terminals, so it seems some fixes are necessary there. Otherwise, I'm all for standards, and it's not surprising that we'd experience friction when a new standard arises. I hope fluent does well! I am adopting it for my main FOSS project.

zbraniecki commented 3 years ago

Out of curiosity, what terminal do you use?

I use nushell and zsh, but also tested in bash. All three handle Unicode properly.

fosskers commented 3 years ago

Thanks. I think it must be the terminal and not the shell, since I was unable to get it to work in fish nor bash via urxvt, xterm, or alacritty. And for all I know, sysadmins are using my tool to manage systems remotely via TERM=linux, so I can't make any assumptions there, or tell them what terminal to use. For now I've upstreamed the isolation mark disabling to cargo-i18n.

Thank you for your help, this was a productive development.