robotology / idyntree

Multibody Dynamics Library designed for Free Floating Robots
BSD 3-Clause "New" or "Revised" License
168 stars 66 forks source link

URDF Parsers' code is implicitly assuming that "C" locale is set #288

Closed traversaro closed 7 years ago

traversaro commented 7 years ago

Problem statement

In the code of the URDF parsers, we use a number of standard C/C++ functions for conversion from string to double/integer, in particular the std::atof function.

This function works fine, but according to the C/C++ standard its behavior depends on the locale currently set in the process, using the std::setlocale function. The problem is that (even if this choice is not actually documented anywhere in the URDF specification or documentation) URDF assume that all the numeric attributes are encoded according to the "C" locale, i.e. the default locale setting for C/C++ . The problem is that some libraries such as Qt on some platforms *actually change the process-wide locale to be "", i.e. the user-preferred locale, see [1] .

I noticed this issue because the iDynTree URDF parser was failing to work properly when used in an application using Qt.

Rejected solutions

To solve this issue, we need to make sure that each conversion from string to double uses the correct locale, i.e. "C" for URDF files. In SDFormat a similar issue [2] was fixed by introducing a call to std::setlocale inside the library [3], but I feel that this solution is a prone to error because the on some platforms it modifies the process-wide locale setting, and so it is possible that another thread in the same process sets calls std::setlocale("") before the conversion is actually performed.

Preferred solution

The solution that I prefer at the moment is to use an instance of std::stringstream for actually performing this numeric conversions for the URDF parser, and to manually set the locale for just that object (that is a local variable of the function) to "C" (or more specifical the classic() locale) using the imbue(const std::locale& loc) method (see [4]):

std::string textToConvert;
double convertedDouble;
std::stringstream ss(textToConvert);
ss.imbue(std::locale::classic());
ss >> convertedDouble;

I wonder if there could be some performance by using std::stringstream and always setting the desired locale, but given that the parser code is typically called only once at configuration time, I will ignore the performance at the moment (even if I would be really interested in seeing some benchmarks).

Additional valid solutions

std::from/to_chars

In C++20, the "fast" conversion functions std::from_chars and std::to_chars will be available, and they will always be using the C locale, regardless of any other locale setting. However, note that std::from_chars just accepts a limited subset of all the string that are considered floating point numbers in other string conversion functions: for example, it does not support a trailing +, so it cannot be used as it is (without string preprocessing) for parsing XML floating point strings (thanks @francesco-romano for correctly spotting this).

Custom functions or libraries

If you are using C, you want to optimize for efficiency or for any reson you cannot use the standard C++ library, there are several custom string <---> double libraries, that may just support the C locale regardless of any other locale setting. See the following lists for some pointers regarding this libraries:

An interesting (but C++ only) library is google/double-conversion.

Refs

[1] http://stackoverflow.com/questions/25661295/why-does-qcoreapplication-call-setlocalelc-all-by-default-on-unix-linux [2] https://bitbucket.org/osrf/sdformat/issues/60 [3] https://bitbucket.org/osrf/sdformat/pull-requests/147/fix-problems-with-latin-locales-and/diff#Lsrc/Param.ccT145 [4] http://stackoverflow.com/questions/1333451/locale-independent-atof

traversaro commented 7 years ago

Note that actually checking if the parsing of the number is successful and halting the parser if this is not the case it would make much easier to spot this kind of locale-related problems. Related link : http://www.cplusplus.com/articles/D9j2Nwbp/ .

traversaro commented 7 years ago

Note that for some of the numerical attributes of URDF, the xsd file available at [1] specifies that they are of type xsd::double [2] . I don't know if format double accepted by C++ functions with the "C" locale is totally compatible with the xsd::double, but it would be worth checking.

[1] https://github.com/ros/urdfdom/blob/master/xsd/urdf.xsd#L47 [2] https://www.w3.org/TR/xmlschema-2/#double .

traversaro commented 7 years ago

cc @diegoferigo that is also working on XML parsers.

traversaro commented 7 years ago

The error in iDynTree classes and functions should be solved by https://github.com/robotology/idyntree/issues/289 . The urdfdom library is subject as well to the problem, and it is used in the legacy KDL-based classes. For that we opened an upstream issue in https://github.com/ros/urdfdom/issues/98 .

diegoferigo commented 7 years ago

Oww this is really nasty, thank for this debugging, it'll bear in mind for the XML parser (https://github.com/robotology-playground/xsens-mvn/pull/12) we'll use for mvnx files.

traversaro commented 7 years ago

Problem solved in the iDynTree's parser, and the urdfdom-parser are in the process of being deprecated, and the upstream issue have been opened. Closing.

traversaro commented 6 years ago

Related (old) YARP issue: https://sourceforge.net/p/yarp0/bugs/52/ .

traversaro commented 6 years ago

Related stackoverflow question: https://stackoverflow.com/questions/13919817/sscanf-and-locales-how-does-one-really-parse-things-like-3-14 .

traversaro commented 6 years ago

Related tinyxml2 issues:

traversaro commented 6 years ago

Related issue:

traversaro commented 6 years ago

Updated the original issue with some new alternative solutions.

traversaro commented 4 years ago

Related PR: https://github.com/fkanehiro/openhrp3/pull/144 .

traversaro commented 3 years ago

Related issue: https://github.com/ignitionrobotics/ign-rendering/issues/136 .

traversaro commented 3 years ago

Related comment: https://github.com/JuliaComputing/llvm-cbe/pull/83/files#r537553875 .

traversaro commented 3 years ago

An interesting comparison of string to double routines: https://github.com/shaovoon/floatbench .

traversaro commented 3 years ago

Related issue: https://github.com/robotology/gym-ignition/issues/301 .

traversaro commented 2 years ago

Another occurence of a similar bug: https://github.com/deepmind/mujoco/issues/131 .

traversaro commented 1 year ago

Related issue on rapidcsv: https://github.com/d99kris/rapidcsv/issues/102 .

traversaro commented 1 year ago

Some more issues:

traversaro commented 11 months ago

Some more issues for my collection:

traversaro commented 4 months ago

More issues for my collection:

traversaro commented 4 months ago

Even on Twitter:

traversaro commented 3 months ago

https://github.com/google-deepmind/mujoco/issues/845#issuecomment-2150004802