mpusz / mp-units

The quantities and units library for C++
https://mpusz.github.io/mp-units/
MIT License
994 stars 79 forks source link

Do we really need ASCII-only text output? #540

Closed mpusz closed 4 months ago

mpusz commented 6 months ago

Besides the Unicode text output mp-units provides the ability to output ASCI-only text as well.

Standardizing such ASCII-only text output will be hard as ISO and SI standards do not specify alternative ASCII characters for those. This means we will have to guess and use some arbitrary things. Moreover, this complicates the design (e.g., requires an additional unit_symbol class template that stores two fixed_string objects).

Please let us know if you have issues with removing support for ASCII-only output and what is the rationale for keeping it.

JohelEGP commented 6 months ago

We can follow [time.duration.io]:

(1.5) Otherwise, if Period​::​type is micro, it is implementation-defined whether units-suffix is "μs" ("\u00b5\u0073") or "us".

mpusz commented 6 months ago

Yes, we could, but I do not think that is a good idea. For chrono it was one exception case. For our library there are plenty of cases like that.

mpusz commented 6 months ago

See: https://github.com/sg16-unicode/sg16-meetings#november-29th-2023.

JohelEGP commented 6 months ago

Our support for ASCII can be one exception case in the specification. Rather than specifying how each string representing a dimension, unit, and eventually quantity, maps to ASCII, just specify that the format specifier for ASCII does an implementation-defined mapping of the Unicode equivalent.

mpusz commented 6 months ago

I think that is not an option. The alternative symbol for each Unicode sign has to be explicitly provided so that text logs from one application can be then read as input by the other (see #541).

JohelEGP commented 6 months ago

Does it? What does scnlib or WG21 says about round-tripping the one case in std::chrono?

tahonermann commented 5 months ago

From a standardization perspective, symbols that utilize only characters from the basic literal character set are required since the complete set of Unicode characters is not supported by all character encodings allowed by the C++ standard. I think the question posed in this issue is therefore misguided.

I believe the desired design is for a unit specification to have a preferred symbol selected from all of the characters available in the Unicode standard as well as a fallback symbol selected from the basic literal character set ([lex.charset]p7). By default, the preferred symbol would be used if the target encoding supports the full range of Unicode characters and the fallback symbol used otherwise. For those that wish to restrict output to ASCII-only, an option should be provided to use the fallback symbol in cases where the preferred symbol could otherwise be used but is not desired.

mpusz commented 5 months ago

Exactly! I tried to form a question so that most C++ developers would understand it. I believe that most have heard about ASCII but may have no clue what "basic literal character set" means 😉

Anyway, the main question remains. Do we want to limit the implementation to The Unicode characters only, or do we also want to provide a fallback option? Having both complicates the design and potential support for text input in the future, but may be required by some users, and I would love to hear about such cases.

mpusz commented 5 months ago

@ChrisRyan98008 stated on LinkedIn:

... from a general engineering opinion I would like to keep the ascii version. I could foresee uses for it. It is just sometimes too hard to type special unicode characters so I presume it would maintain symmetry with that input method.

mpusz commented 5 months ago

@ChrisRyan98008 also suggested:

Maybe you could just do the unicode output but with a units translations output utility layer to ascii. Maybe this would open up the translation output option for other formats like LaTeX.

For now, we do not plan to provide a translation layer for text output, but a user could probably do something on their own to implement it. Please let us know in case someone has a good idea of how to incorporate such a feature into the framework.

tahonermann commented 5 months ago

Anyway, the main question remains. Do we want to limit the implementation to The Unicode characters only, or do we also want to provide a fallback option?

A fallback symbol is required for standardization since there is no guarantee that characters outside the basic literal character set are representable at all. That fallback symbol is needed regardless of whether the proposed std::format grammar includes an option to explicitly opt-out of use of symbols that potentially include characters from outside the basic literal character set.

The question to be posed is, is the units-text-encoding grammar option currently present in the D3045R0 draft needed or does it suffice for the implementation to determine on its own when to use the fallback symbol. The responses so far suggest that the grammar option would be used and appreciated. I don't see a reason not to provide that option.

kwikius commented 4 months ago

Exactly! I tried to form a question so that most C++ developers would understand it. I believe that most have heard about ASCII but may have no clue what "basic literal character set" means 😉

Anyway, the main question remains. Do we want to limit the implementation to The Unicode characters only, or do we also want to provide a fallback option? Having both complicates the design and potential support for text input in the future, but may be required by some users, and I would love to hear about such cases.

Use case : I use my quantities library on 8bit mcu .eg https://github.com/kwikius/ultrasonic_wind_sensor/blob/master/libraries/UltrasonicWindSensor/wind_sensor_impl.cpp. ( Atmega328 ) For that type of use, the serial port is often used for output with ascii text.

mpusz commented 4 months ago

Based on the feedback we got, we decide to leave ASCII-only text output.