opensim-org / opensim-core

SimTK OpenSim C++ libraries and command-line applications, and Java/Python wrapping.
https://opensim.stanford.edu
Apache License 2.0
800 stars 323 forks source link

Improved String to Decimal parsing #3943

Closed alexbeattie42 closed 4 weeks ago

alexbeattie42 commented 1 month ago

Fixes issue #3924

Brief summary of changes

String to Decimal Conversion

The above fixed the reading/writing problems but the code was still failing. Even with the imbuing of the input and output file stream, the parser converting strings to decimals is still using the system locale. After enough digging and exploration I found that std::stod is locale dependent. It is a thinly veiled wrapper of std::strtod which does mention the locale dependance. To fix this issue I did the following:

Testing I've completed

Looking for feedback on...

CHANGELOG.md (choose one)


This change is Reviewable

alexbeattie42 commented 1 month ago

If I'm reading/understanding correctly, this corrects the number parsing to be locale aware, but is number writing also locale specific?

I think the best case for locale handling would be to follow the robustness principle aka "be conservative in what you send, be liberal in what you accept". In this context, I think that should look like locale independent parsing (i.e. can parse comma or period decimals), and optionally hard-coded locale writing (.trc/.mot/.sto files with period decimals only). That would ensure that users with any locale can read .trc/.mot/.sto files generated anywhere, and (with hard-coded period decimal writing) ensure maximum compatibility reading .trc/.mot/.sto files by any third-party software/code (which might be less flexible with parsing).

This is in general a good idea but not possible in this specific instance. The file (at least all the imu .sto files in the examples are this way) looks like this:

time    torso_imu   pelvis_imu  tibia_r_imu femur_r_imu tibia_l_imu femur_l_imu calcn_r_imu calcn_l_imu
0   0.01518343195907421,0.7812957439677948,-0.04770728972005249,0.622149852012617   0.1207747690878542,0.8053315183184858,-0.142819722762797,0.5625452226662099 0.3951845214451953,-0.6053549348947597,-0.3937077025151748,-0.5677753444707826  0.3753907991577196,-0.5658507979303541,-0.3468703522649773,-0.646974173448122   0.5835647133912788,0.4063956405762357,-0.6152428563326652,0.3402514310576951    0.5854606233169395,0.3832895330622857,-0.5480699907177834,0.458196767409859 0.2178287352520044,-0.9667284914115302,-0.1327179739292387,-0.01930298909021503 0.1759460998977403,0.3471632588712789,-0.9102461481857205,-0.1413244187453365
0.01    0.0153210137178647,0.781247219174652,-0.04788784790226691,0.6221935415076403    0.1208796815319564,0.8052840031748407,-0.1430221742909909,0.5625392737263002    0.395100589522564,-0.6054134414094215,-0.3937159987716847,-0.5677656219171235   0.3753901734774389,-0.5658535433909461,-0.3469560968783313,-0.6469261564718519  0.5836601509355528,0.4062299483085272,-0.6153208699774565,0.3401445049935765    0.5856153629154232,0.3831113700639101,-0.5481436844014433,0.4580598499061032    0.2178083403178592,-0.9667443847139103,-0.1326299339363174,-0.01934223710304965 0.1759173816325171,0.3471787034472968,-0.9102653245939488,-0.1411986598877146

This parses as time as a decimal number seperated by a tab, followed by 4 values for each quaternion (separated by commas), then a tab, and repeat. So if it were acceptable to write or read numerics with commas you wouldn't know if the comma was a seperator between quaternions or a decimal place indicator. So in order to be able to accept numerics that use commas the file format would have to be changed. Per this comment it seems reasonable to not allow reading or writing commas in numerics.

alexbeattie42 commented 1 month ago

Updated to remove File IO changes and only include the minimal changes tostod which is the root of the problem

adamkewley commented 1 month ago

These are great changes, and I'm glad someone's giving it a whack, but:

I don't see a reason to not use the fast library

If I had a euro for every time a developer comes along and writes some version of this, I'd have ezc3d, docopt, spdlog, eigen, colpack, adolc, ipopt, metis, mumps, casadi, fastfloat... etc. euros. Everybody thinks this until they have to maintain a build long-term and deal with issues that have opening paragraphs like "I'm trying to build OpenSim on EsotericLinux with a PowerPC chip and it's failing because tropter isn't fetching from sourceforge".

Imo, drop the fastfloat dependency. The file loaders aren't currently performance limiters for end-users (as opposed to, say, simulation performance). The data files OpenSim uses tend to be very small, and IO is probably a bigger factor when parsing them (even if your OS's page cache is hot). Your benchmark shows this much. Regardless of how many stars it has, or its use in chromium, you should've thought at that point "not worth it, the prio is to fix the locale issues" and dropped it altogether.

adamkewley commented 4 weeks ago

This seems like a reasonable enough refactor!

The only thing I'd change is the static initialization of locale_, but this can be handled post-merge

adamkewley commented 4 weeks ago

Nice work @alexbeattie42 👍