Closed alexbeattie42 closed 4 weeks ago
If I'm reading/understanding correctly, this corrects the number parsing to be locale aware, but is number writing also locale specific?
I think the best case for locale handling would be to follow the robustness principle aka "be conservative in what you send, be liberal in what you accept". In this context, I think that should look like locale independent parsing (i.e. can parse comma or period decimals), and optionally hard-coded locale writing (.trc/.mot/.sto files with period decimals only). That would ensure that users with any locale can read .trc/.mot/.sto files generated anywhere, and (with hard-coded period decimal writing) ensure maximum compatibility reading .trc/.mot/.sto files by any third-party software/code (which might be less flexible with parsing).
This is in general a good idea but not possible in this specific instance. The file (at least all the imu .sto
files in the examples are this way) looks like this:
time torso_imu pelvis_imu tibia_r_imu femur_r_imu tibia_l_imu femur_l_imu calcn_r_imu calcn_l_imu
0 0.01518343195907421,0.7812957439677948,-0.04770728972005249,0.622149852012617 0.1207747690878542,0.8053315183184858,-0.142819722762797,0.5625452226662099 0.3951845214451953,-0.6053549348947597,-0.3937077025151748,-0.5677753444707826 0.3753907991577196,-0.5658507979303541,-0.3468703522649773,-0.646974173448122 0.5835647133912788,0.4063956405762357,-0.6152428563326652,0.3402514310576951 0.5854606233169395,0.3832895330622857,-0.5480699907177834,0.458196767409859 0.2178287352520044,-0.9667284914115302,-0.1327179739292387,-0.01930298909021503 0.1759460998977403,0.3471632588712789,-0.9102461481857205,-0.1413244187453365
0.01 0.0153210137178647,0.781247219174652,-0.04788784790226691,0.6221935415076403 0.1208796815319564,0.8052840031748407,-0.1430221742909909,0.5625392737263002 0.395100589522564,-0.6054134414094215,-0.3937159987716847,-0.5677656219171235 0.3753901734774389,-0.5658535433909461,-0.3469560968783313,-0.6469261564718519 0.5836601509355528,0.4062299483085272,-0.6153208699774565,0.3401445049935765 0.5856153629154232,0.3831113700639101,-0.5481436844014433,0.4580598499061032 0.2178083403178592,-0.9667443847139103,-0.1326299339363174,-0.01934223710304965 0.1759173816325171,0.3471787034472968,-0.9102653245939488,-0.1411986598877146
This parses as time as a decimal number seperated by a tab, followed by 4 values for each quaternion (separated by commas), then a tab, and repeat. So if it were acceptable to write or read numerics with commas you wouldn't know if the comma was a seperator between quaternions or a decimal place indicator. So in order to be able to accept numerics that use commas the file format would have to be changed. Per this comment it seems reasonable to not allow reading or writing commas in numerics.
Updated to remove File IO changes and only include the minimal changes tostod
which is the root of the problem
These are great changes, and I'm glad someone's giving it a whack, but:
I don't see a reason to not use the fast library
If I had a euro for every time a developer comes along and writes some version of this, I'd have ezc3d
, docopt
, spdlog
, eigen
, colpack
, adolc
, ipopt
, metis
, mumps
, casadi
, fastfloat
... etc. euros. Everybody thinks this until they have to maintain a build long-term and deal with issues that have opening paragraphs like "I'm trying to build OpenSim on EsotericLinux with a PowerPC chip and it's failing because tropter isn't fetching from sourceforge".
Imo, drop the fastfloat
dependency. The file loaders aren't currently performance limiters for end-users (as opposed to, say, simulation performance). The data files OpenSim uses tend to be very small, and IO is probably a bigger factor when parsing them (even if your OS's page cache is hot). Your benchmark shows this much. Regardless of how many stars it has, or its use in chromium, you should've thought at that point "not worth it, the prio is to fix the locale issues" and dropped it altogether.
This seems like a reasonable enough refactor!
The only thing I'd change is the static initialization of locale_
, but this can be handled post-merge
Nice work @alexbeattie42 👍
Fixes issue #3924
Brief summary of changes
String to Decimal Conversion
The above fixed the reading/writing problems but the code was still failing. Even with the imbuing of the input and output file stream, the parser converting strings to decimals is still using the system locale. After enough digging and exploration I found that
std::stod
is locale dependent. It is a thinly veiled wrapper ofstd::strtod
which does mention the locale dependance. To fix this issue I did the following:std::stod
in theIO.h
classstd::stod
with the newIO::stod
methodisstream
method (standard c++):Testing I've completed
Looking for feedback on...
CHANGELOG.md (choose one)
This change is