Open jgarzik opened 5 months ago
Requirements after offline discussion with @jgarzik :
cargo xtr
a la gettext crate, but not required. NLSPATH
(described in each man page) or LC_COLLATE
(sort order) BOTH of the following implementation strategies are valid:
(1) extract strings from each .rs source file
(2) the "IBM approach": assign unique numbers to each and every error message, whether app-specific or generic, and maintain a posixutils global set of strings (and their translations)
According to https://stackoverflow.com/questions/43019882/does-libc-show-international-error-messages it seems like we should be able to use https://docs.rs/libc/latest/libc/fn.strerror.html to gain access to system provided localized messages for libc errno codes, but it sees like there might be some safety concerns related to using this function that will warrant more investigation if we use it https://users.rust-lang.org/t/unsafe-and-strerror-impossible-to-fix/90804
Most likely posixutils-rs will preference using Rust standard library functions over libc. Perhaps we can use https://doc.rust-lang.org/stable/std/io/struct.Error.html#method.raw_os_error
It seems like it should be possible https://play.rust-lang.org/?version=stable\&mode=debug\&edition=2021\&gist=52ae229dd19b14298c25d488516d3750 . Here is the output using the french locale installed on my system:
It seems like libc::setlocale()
needs to be called manually using the contents of the relevant environment variables, the locale for libc is not automatically detected from the available environment variables.
Interestingly the std::io::Error
implementation appears to defer to libc::strerror()
for its output, so there will be no need for us to call libc::strerror()
manually, we can simply use the std::fmt::Display
output provided we configure libc using libc::setlocale()
This almost feels like an oversight on the part of Rust’s standard library not to have this functionality enabled by default on platforms that support it.
With the exception of m4
all the binaries in posixutils-rs
currently use clap’s derive macro to generate help messages. In https://docs.rs/clap/latest/clap/_derive/index.html#command-attributes the about
attribute accepts an expression about [= <expr>]
which presumably gets put into https://docs.rs/clap/latest/clap/struct.Command.html#method.about. To use this with localized messages, the messages would need to be available in a ’static
lifetime. To get around this we could use some kind of static thread-local or global cell containing a mutex that can be used to load the appropriate locale at the start of main()
based on the current system settings, before executing clap
. Another issue is that using the about
attribute disables the parsing of help messages from the doc
attribute provided by Rust’s documentation comments, which means that if we want to have these struct fields documented in the standard way for Rust, we will end up with duplicated text. We could probably get around this by creating a custom derive macro which wraps clap’s one to use the documentation comments for these fields in the localization system and provide the necessary about
too to satisfy the optional requirement 3. There does come a question about the source of truth: if each application is sharing messages in a single registry as per requirement 5., then we may end up with duplicates that need to be detected. Perhaps it is better just to refer to localizations only by a message id, and keep it separated from code documentation comments, this does make the implementation a lot simpler too. Whatever we decide to do here should follow on from the more general decision about how to localize strings in the application for requirement 2a.
While an obvious choice for message format would be to use GNU gettext, some arguably better and more modern alternatives do exist. fluent puts forward some good arguments for the choices it makes that are different https://github.com/projectfluent/fluent/wiki/Fluent-vs-gettext In summary of this article:
fluent
https://rustc-dev-guide.rust-lang.org/diagnostics/translation.html, a significant endorsement.Further comparisons between systems and crate implementations:
i18n-embed
+ i18n-embed-fl
provides some additional functionality on top of a basic fluent
setup:
cargo-i18n
https://github.com/kellpossible/cargo-i18n/issues/31 Actually it’s possible to benefit from these without actually using i18n-embed
but simply using the i18n.toml
config file to use with cargo-i18n
. i18n-embed-fl
and uses codegen instead of a proc macro, this provides code completion and type signatures for messages as functions (there is an open issue to implement this for i18n-embed-fl
https://github.com/kellpossible/cargo-i18n/issues/73 ).gettext
adds 9 additional crate dependencies, an additional 0.05s to build time. gettext-rs
adds 6 crate dependencies, additional 144s to build time (if static build), or 0.8s if using gettext-system
feature for dynamic linking. fluent
brings in an additional 15 crate dependencies, additional 0.7s to build time. i18n-embed
+ i18n-embed-fl
brings an additional 55 dependencies, additional 4.5s to build, I have some ideas for how to bring this down considerably (https://github.com/kellpossible/cargo-i18n/issues/131). This proposal is that we definitely use the fluent
localization system instead of gettext
, for a minimal setup it could potentially even have a lower overhead, and has none of the licensing concerns with LGPL gettext
for systems that must build it statically, it seems like an obvious choice after considering the tradeoffs. If localization is to be taken seriously the features the fluent
provides over other simpler ad-hoc systems with simple message formatting are very important.
The next decision is what to use for the scaffolding around fluent
. Messages must be loaded from disk, bundles must be configured according to the user’s requested locale, ideally some form of static checking should be employed in order to help prevent mundane runtime errors. If keeping dependencies to a bare minimum is a high priority then we could gradually implement this ourselves from scratch. If however there is a desire to share this functionality with the rust community at large, then I’d propose to use i18n-embed-fl
and cargo-i18n
and upstream any changes which may be required in order to make it fit the requirements of this project. I’m the maintainer for those projects so I’d be very happy to take on this responsibility if that’s the direction we decide to go with.
Introduction
Soliciting discussion over the localization (i18n) strategy for this project.
Goals
Goal 1: Localize everything
The goal is complete localization of all messages visible to the user, within the bounds of POSIX compliance:
clap
crate)io::Result
)Goal 2: Encourage UTF-8
To be forward-looking, this project looks for opportunities to
This project should aggressively interpret the POSIX standards in terms of UTF-8 support, and look for opportunities to create default-UTF-8 operating modes, with a fallback mode that is "POSIX-ly correct."
Implementation strategies
Current strategy
The current strategies are,
gettext
crate, and mark strings withgettext()
. This provides a starting point for per-util coding, and at least gets us started on the road to i18n.Improvements to our i18n
At present, OS error messages and --help are not translated at all, and need a project-wide strategy.
Also, one idea that is aligned with the
gencat
util is to generate catgets message catalogs and abandon gettext. This works because catgets exists on all modern platforms.See issue #65 for util-related tasks.
Feedback and thoughts are requested. We want to give users the best i18n support possible.