modelica / ModelicaStandardLibrary

Free (standard conforming) library to model mechanical (1D/3D), electrical (analog, digital, machines), magnetic, thermal, fluid, control systems and hierarchical state machines. Also numerical functions and functions for strings, files and streams are included.
https://doc.modelica.org
BSD 3-Clause "New" or "Revised" License
452 stars 165 forks source link

Locale-specific reading of files in new tables #1151

Closed modelica-trac-importer closed 7 years ago

modelica-trac-importer commented 7 years ago

Reported by hansolsson on 31 May 2013 09:36 UTC Two minor issues found when looking at ModelicaTables source code:

  1. It uses sscanf (also with lf) for reading textual data-files, which means that parsing of those files will be locale-dependent.

This breaks data-compatibility. (And, no, we cannot always run the simulator in C-locale.)

  1. It uses strtrok during parsing - which means that any other code using strtok will get different result if the table-routines are called. I don't view that as a problem in practice, but prefer code without strtok.

Migrated-From: https://trac.modelica.org/Modelica/ticket/1151

modelica-trac-importer commented 7 years ago

Comment by beutlich on 31 May 2013 09:54 UTC It is not solely a tables issue since there is a sscanf (with lg) in ModelicaStrings source code, too.

modelica-trac-importer commented 7 years ago

Comment by hansolsson on 31 May 2013 10:58 UTC Replying to [comment:1 beutlich]:

It is not solely a tables issue since there is a sscanf (with lg) in ModelicaStrings source code, too.

True, have adapted that file in Dymola to avoid those problems.

modelica-trac-importer commented 7 years ago

Comment by hansolsson on 31 May 2013 13:34 UTC To clarifiy: As I see it there are two possibilities for handling this:

Factor out the parsing of numbers to one/a few routines, have a default implementation based on scanf - but make it optional based on a flag. That way it will be possible to build the library if you have your own routine for handling locale-independent parsing of numbers.

However, it implies that we will have vendor-specific variants of the libray, which is not ideal.

The second possibility would be actually handle this in a good way, but I don't see such a good simple possibility.

In C++ you can have multiple locales in a simple way (imbue-function of streams), but in standard C (89/90, 99, 11) it seems you have to call setlocale back and forth which does not work in multi-threaded programs.

There are specific solutions for different compilers, e.g. Visual Studio 2005 and later have *scanf_l. http://msdn.microsoft.com/en-us/library/zkx076cy(v=vs.80).aspx I assume there is something similar for gcc, clang, etc.

You can, of course, write your own parser for numbers - but ...

modelica-trac-importer commented 7 years ago

Comment by beutlich on 31 May 2013 21:02 UTC Usage of _sscanf_l or _strtod_l for Visual C++ >= 2005 would be the easiest fix.

There is a locale independet implementation of strtod in http://www.netlib.org/fp/dtoa.c. I tested it on the test.txt data file and it worked (on Visual C++, if renamed and line 303 is commented out).

What I could do is to replace the one call to sscanf with "%lf" by _sscanf_l (if applicable) and use otherwise strtod with reference to Netlib in case there are locale problems. See attachment:T1151.patch.

modelica-trac-importer commented 7 years ago

Modified by beutlich on 31 May 2013 21:05 UTC

modelica-trac-importer commented 7 years ago

Comment by beutlich on 3 Jun 2013 18:03 UTC I checked several forum discussions for a save, locale-independent, native C implementation of atof, strtod or sscanf and I am afraid there is nothing I can do in just one line of code. Most forum threads point to a custom parser which then very soon goes very complex for every kind of floating-point numbers one can imagine. The Netlib one is just one offer.

Please comment on this patch suggestion.

modelica-trac-importer commented 7 years ago

Comment by sjoelund.se on 3 Jun 2013 21:48 UTC Using strtod.c seems like a shorter dependency. The file the comment in the patch above suggests to use is 4371 lines of code since it actually knows how to make a string out of a double (not the thing you need here, right?). The strtod sources is only 271 LOC including comments.

modelica-trac-importer commented 7 years ago

Comment by otter on 4 Jun 2013 06:59 UTC Under

https://svn.modelica.org/projects/Modelica/branches/development/NewTables/Modelica_NewTables

a scanner for text files is present based only on getc (under Resources\Include), together with a Modelica interface (under Modelica_NewTables.ScanTextFile; note a full documentation with all details of all functions is provided as Modelica documentation under package ScanTextFile).

My preference would be to use this scanner, since we get at the same time (missing) functionality in Modelica. This scanner is used since several years intensively for the DLR FlexibleBodies library to read large definitions of flexible bodies. I therefore expect that there are no essential bugs anymore.

This scanner has the following benefits:

From previous experience, I am a bit skeptical to get as good scanner somewhere from the internet.

Since I am really interested to have a fast text file scanner in Modelica for many years, I would also help with the implementation.

modelica-trac-importer commented 7 years ago

Comment by beutlich on 4 Jun 2013 07:00 UTC What I read about this strtod from ruby is that it gives wrong results in certain cases, see http://stackoverflow.com/questions/1994658/locale-independent-strtod-implementation.

modelica-trac-importer commented 7 years ago

Comment by beutlich on 4 Jun 2013 07:04 UTC Replying to [comment:8 otter]:

From previous experience, I am a bit skeptical to get as good scanner somewhere from the internet.

The one mentioned from Netlib does really look approved though.

modelica-trac-importer commented 7 years ago

Comment by otter on 4 Jun 2013 07:19 UTC Replying to [comment:10 beutlich]:

Replying to [comment:8 otter]:

From previous experience, I am a bit skeptical to get as good scanner somewhere from the internet.

The one mentioned from Netlib does really look approved though.

I quickly inspected. You are right, this code looks very advanced. For me it looks too advanced because everything seems to be performed with bit-operations and there are many #ifdefs to provide code for particular machines. If there is a bug in this code, it will be very difficult to find and to correct it.

modelica-trac-importer commented 7 years ago

Comment by sjoelund.se on 4 Jun 2013 07:27 UTC Replying to [comment:11 otter]:

Replying to [comment:10 beutlich]:

Replying to [comment:8 otter]:

From previous experience, I am a bit skeptical to get as good scanner somewhere from the internet.

The one mentioned from Netlib does really look approved though.

I quickly inspected. You are right, this code looks very advanced. For me it looks too advanced because everything seems to be performed with bit-operations and there are many #ifdefs to provide code for particular machines. If there is a bug in this code, it will be very difficult to find and to correct it.

Well. If you think for a second parsing a double is easy... Think again. It would most likely be done in a wrong and/or slow way if you did it only in Modelica since you would end up with rounding errors in bad places.

As for the linked netlib version. I have used it in the past for [om:source:trunk/SimulationRuntime/c/meta/dtoa.c OpenModelica]. It worked fine on Linux and OSX, but gave the wrong result on Windows. Note that this was for dtoa, and not strtod.

So in a sense, otter is correct that the code is advanced and if it does not support the OS, you are screwed ;)

There is always the following option on Linux though:

#include <stdlib.h>
#include <stdio.h>
#include <locale.h>
int main()
{
  setlocale(LC_NUMERIC, ""); /* initialized from environment ... done somewhere; usually not by us */
  const char *previous = setlocale(LC_NUMERIC, NULL); /* get old locale for backup */
  fprintf(stderr, "previous: %s\n", previous);
  double d1 = strtod("1.23", NULL);
  setlocale(LC_NUMERIC, "C");
  double d2 = strtod("1.23", NULL);
  fprintf(stderr, "d1: %g d2: %g\n", d1, d2);
  setlocale(LC_NUMERIC, previous); /* restore locale */
} 
modelica-trac-importer commented 7 years ago

Comment by beutlich on 4 Jun 2013 07:36 UTC Yes, I know about that option to change locale but it is not thread-safe as global variables are changed. Is there anything like _sscanf_l or _strtod_l in Linux or MinGW?

modelica-trac-importer commented 7 years ago

Comment by otter on 4 Jun 2013 08:13 UTC After discussion with Hans, it seems best at the moment to apply your proposed patch (use special function for Visual C++ and the C-standard implementation otherwise). Tool vendors are then responsible to provide something better for Linux, if needed.

modelica-trac-importer commented 7 years ago

Comment by beutlich on 4 Jun 2013 08:23 UTC sjoelund: I can see a _sscanf_l in stdio.h of mingw-w64 (see http://gitorious.org/w64/w64/blobs/92c6540c65a72c696d1906b20001e20a4e171298/mingw-w64-headers/crt/stdio.h). But it is not in my default MinGW installation.

modelica-trac-importer commented 7 years ago

Comment by beutlich on 4 Jun 2013 08:27 UTC Replying to [comment:14 otter]:

After discussion with Hans, it seems best at the moment to apply your proposed patch (use special function for Visual C++ and the C-standard implementation otherwise). Tool vendors are then responsible to provide something better for Linux, if needed.

Should it also be applied to ModelicaStrings.c?

modelica-trac-importer commented 7 years ago

Comment by sjoelund.se on 4 Jun 2013 08:28 UTC You have http://www.idiap.ch/~formaz/doc/glibdocs/glib-string-utility-functions.html#G-STRTOD in Linux, but usually not under OSX.

There is strtod_l in Linux. But...

Attention: even though several *_l interfaces are part of POSIX:2008, these are not.

And I did not yet figure out how to make the header actually include it (I copy-pasted the prototype, and it does work). I'll get back if I get it to work...

modelica-trac-importer commented 7 years ago

Comment by sjoelund.se on 4 Jun 2013 08:32 UTC You do probably need a #define for this and some configuration of the tool. But this does work in Linux:

#define _GNU_SOURCE 1
#include <stdio.h>
#include <stdlib.h>
#include <locale.h>
#include <xlocale.h>

static locale_t
get_C_locale (void)
{
  static int initialized = 0;
  static locale_t C_locale = NULL;
  if (!initialized) {
    C_locale = newlocale (LC_ALL_MASK, "C", NULL);
    initialized = 1;
  }
  return C_locale;
}

int main() {
  setlocale(LC_ALL, "");
  double d1 = strtod_l("1.73", NULL, get_C_locale());
  double d2 = strtod("1.73", NULL);
  double d3 = strtod("1,73", NULL);
  fprintf(stderr, "%g %g %g\n", d1, d2, d3);
}
modelica-trac-importer commented 7 years ago

Comment by beutlich on 4 Jun 2013 09:21 UTC I attached a second patch attachment:T1151_2.patch that incorporates the suggested fix for linux.

Would it be sufficient to have LC_NUMERIC for LC_ALL_MASK?

Still one question open: Is there anything similar for MinGW?

modelica-trac-importer commented 7 years ago

Comment by beutlich on 4 Jun 2013 10:03 UTC Is xlocale.h really required for linux? I noticed that it still compiles if removed (on GCC 4.1.2 20080704 (Red Hat 4.1.2-52))

modelica-trac-importer commented 7 years ago

Comment by sjoelund.se on 4 Jun 2013 10:13 UTC Ok, possibly not needed. POSIX-2008 says:

The header shall define the locale_t type, representing a locale object.

modelica-trac-importer commented 7 years ago

Comment by beutlich on 4 Jun 2013 10:17 UTC And it seems LC_NUMERIC is good enough.

modelica-trac-importer commented 7 years ago

Comment by beutlich on 4 Jun 2013 10:39 UTC Sorry, one more question: Should it be __linux__ or linux for the preprocessor on Linux?

modelica-trac-importer commented 7 years ago

Comment by sjoelund.se on 4 Jun 2013 10:56 UTC Replying to [comment:23 beutlich]:

Sorry, one more question: Should it be __linux__ or linux for the preprocessor on Linux?

Does not matter as both are defined.

marsj@marsj-laptop:~/dev$ grep -R "(linux)" /usr/include/  | wc -l
19
marsj@marsj-laptop:~/dev$ grep -R "(__linux__)" /usr/include/  | wc -l
42

Probably __linux__ is better.

modelica-trac-importer commented 7 years ago

Comment by anonymous on 4 Jun 2013 19:48 UTC sscanf should better not be used as it may return unexpected results. The better way is strtod with a proper error check. See the code snippet.

#include <stdio.h>
#include <locale.h>
#include <stdlib.h>
#include <string.h>
#include <ctype.h>

int main() {
  _locale_t loc = _create_locale(LC_NUMERIC, "C");
  char str[] = "2.*pi";

  {
    double d1;
    if (1 == _sscanf_l(str, "%lg", loc, &d1)) {
      printf("_sscanf_l: %lg\n", d1);
    }
  }

  {
    char* endptr;
    double d2;
    int ok = 1;
    d2 = _strtod_l(str, &endptr, loc);
    if (*endptr != 0) {
      /* error check */
      size_t len = strlen(endptr);
      size_t i;
      for (i = 0; i < len; i++) {
        if (!isspace(endptr[i])) {
          ok = 0;
          break;
        }
      }
    }
    if (ok) {
      printf("_strtod_l: %lg\n", d2);
    }
    else {
      printf("_strtod_l: failed\n");
    }
  }

  return 0;
}

which prints

_sscanf_l: 2
_strtod_l: failed
modelica-trac-importer commented 7 years ago

Comment by beutlich on 4 Jun 2013 20:38 UTC Third patch attachment:T1151_3.patch​ also incorporates the suggested replacement of sscanf by strtod/strtol.

modelica-trac-importer commented 7 years ago

Comment by hansolsson on 4 Jun 2013 21:57 UTC Replying to [comment:25 anonymous]:

sscanf should better not be used as it may return unexpected results. The better way is strtod with a proper error check. See the code snippet.

... Wouldn't a shorter code be something like (have not tested it):

  char dummy2;
  if (1 == _sscanf_l(str, "%lg %c", loc, &d1, &dummy2)) 

(i.e. follow the number by skipping whitespace and then reading a character - but not caring about it; the 2nd part should fail - I changed to not use %*c since it seemed to obscure).

modelica-trac-importer commented 7 years ago

Comment by beutlich on 7 Jun 2013 08:24 UTC e02349c8a3f6465302aab376af06d3aea180ef7d has attachment:T1151_3.patch​ applied. This solves the first issue on locale dependent sscanf as discussed. The second issue on strtok actually is considered as comment without need for code changes.

modelica-trac-importer commented 7 years ago

Comment by beutlich on 7 Jun 2013 09:03 UTC Additional thoughts:

modelica-trac-importer commented 7 years ago

Comment by beutlich on 7 Jun 2013 09:40 UTC The following test model

model TestScanReal
  Real number;
  algorithm
    (number,):=Modelica.Utilities.Strings.scanReal("2.*pi");
end TestScanReal;

results in 2 for number. I guess this is as expected. Please confirm.

modelica-trac-importer commented 7 years ago

Comment by otter on 7 Jun 2013 09:47 UTC Replying to [comment:30 beutlich]:

The following test model

model TestScanReal
  Real number;
  algorithm
    (number,):=Modelica.Utilities.Strings.scanReal("2.\*pi");
end TestScanReal;

results in 2 for number. I guess this is as expected. Please confirm.

Yes, this is correct (since scanReal parses the next number and then stops and returns the index where it stopped and the number found).

modelica-trac-importer commented 7 years ago

Comment by beutlich on 7 Jun 2013 11:37 UTC Fourth patch attachment:T1151_4.patch​​ fixes the locale dependency of sscanf in ModelicaStrings.c. It also added the MODELICA_EXPORT define and renamed error to Modelica_ERROR as done in ModelicaInternal.c

I did compiler tests on Windows (VC6 and VS 2005), MinGW, Borland BCC, LCC and Linux.

There is no longer a dependency on NO_FILE_SYSTEM as strtod and strtol are defined in stdlib.h (whereas sscanf was in stdio.h).

Please cross-check!

modelica-trac-importer commented 7 years ago

Modified by beutlich on 7 Jun 2013 11:44 UTC

modelica-trac-importer commented 7 years ago

Comment by beutlich on 7 Jun 2013 18:06 UTC Replying to [comment:32 beutlich]:

There is no longer a dependency on NO_FILE_SYSTEM as strtod and strtol are defined in stdlib.h (whereas sscanf was in stdio.h).

That means that on target platforms where scanReal and scanInteger previously failed due to a missing file system (dSPACE, xPC) it is now also supported (as a side-effect). Maybe this is worth an extra ticket.

modelica-trac-importer commented 7 years ago

Comment by beutlich on 7 Jun 2013 19:09 UTC This locale issue keeps bothering me. We now have a working solution for GCC on Linux and Visual C++ >= 2005 but not for any other build environments like MinGW and Cygwin.

I checked again the sources of the Netlib implementation of strtod and also of the Wine implementation of _strtod_l. All what they consider as locale dependent is the decimal-point, nothing else. So my suggestion for the general solution is to make a copy of the string to be parsed and replace the first dot-decimal-point by the locale-specific decimal-point. This of cource only needs to be done if the decimal-points differ. Then call the locale-depedendent strtod on that string. This should do it, shouldn't it?

modelica-trac-importer commented 7 years ago

Comment by sjoelund.se on 7 Jun 2013 19:23 UTC No, it won't work the way you wrote. Consider the string 1 5.3. It is supposed to parse 1 and return 5.3 as the rest of the string. If you do the replacement, it will return 5,3.

If you do want to make this simple replacement for locale-independent strtod, consider trying first strtod, and if the next character is ,, and the parsed characters did not include a . or a ,. Then copy the string and replace it with a .; then try parsing the string again.

Or as an alternative, do what you do and patch the . back in the returned string if it was not parsed by strtod.

Anyway, yes it is possible to do this as fall-back. Use localeconv to query the decimal separator so you do not do replacements that are not needed.

modelica-trac-importer commented 7 years ago

Comment by mtiller on 7 Jun 2013 19:31 UTC Forgive me, I'm probably missing something here or being too Americentric, but why do we need any of this? Couldn't we simply replace the sscanf calls with something that isn't locale dependent? It just seems like the time spent discussing this, we could have simply rewritten those bits with a regular expression parser or something. Or is there just too much to re-implement here?

modelica-trac-importer commented 7 years ago

Comment by sjoelund.se on 7 Jun 2013 19:34 UTC Replying to [comment:37 mtiller]:

Or is there just too much to re-implement here?

It is one of these things that is stupidly annoying to implement and you wish the OS libraries knew about. The OS does, but straight ISO libc does not...

modelica-trac-importer commented 7 years ago

Comment by beutlich on 7 Jun 2013 19:39 UTC Maybe I was not clear. Keeping your example string 1 5.3 a locale-independent strtod returns 1 and sets the pointer to 5.3. If 1 5.3 is replaced by 1 5,3 the locale-dependent strtod does the same (assuming , as decimal point).

modelica-trac-importer commented 7 years ago

Comment by sjoelund.se on 7 Jun 2013 19:46 UTC Yes, but the return pointer would be 5,3 and not 5.3, which might be bad if that part of the string is not intended to be parsed as a double. I think you will find out if you implement it in this way and try it on a number of tests.

modelica-trac-importer commented 7 years ago

Comment by beutlich on 7 Jun 2013 19:48 UTC Replying to [comment:37 mtiller]:

Or is there just too much to re-implement here?

Check http://www.netlib.org/fp/dtoa.c if you are curious. However the wine function strtod_helper in http://source.winehq.org/git/wine.git/blob_plain/HEAD:/dlls/msvcrt/string.c is much simpler.

modelica-trac-importer commented 7 years ago

Comment by beutlich on 7 Jun 2013 19:50 UTC Replying to [comment:40 sjoelund.se]:

Yes, but the return pointer would be 5,3 and not 5.3

Yes of course, but they have same length.

modelica-trac-importer commented 7 years ago

Comment by beutlich on 7 Jun 2013 20:06 UTC Martin, I do not see why the return string is relevant here at all. It is the return value that matters. ModelicaStrings_scanReal calls strtod exactly once and returns exactly one double. For readtxtData in ModelicaStandardTables there cannot be any , as this is also used as delimiter for strtok.

modelica-trac-importer commented 7 years ago

Comment by sjoelund.se on 7 Jun 2013 20:23 UTC I thought there was a custom delimiter for the csv format? Or was that somewhere else? If it is only used to parse full strings, I guess it is fine. But it is no strtod drop-in replacement if anyone else wants to re-use it.

modelica-trac-importer commented 7 years ago

Comment by beutlich on 7 Jun 2013 20:28 UTC You made my hairs turn grey. This is just about the two occurences of strtod in ModelicaStandardTables.c and ModelicaStrings.c. No full string parsing at all.

modelica-trac-importer commented 7 years ago

Comment by beutlich on 7 Jun 2013 21:46 UTC Fifth patch attachment:T1151_5.patch adds a general solution for locale-independent strtod in ModelicaTables.c according to comment:35. It also

modelica-trac-importer commented 7 years ago

Comment by beutlich on 9 Jun 2013 18:32 UTC Patch attachment:T1151_all.patch​ combines my proposed changes for both ModelicaStrings.c and ModelicaStandardTables.c.

modelica-trac-importer commented 7 years ago

Comment by sjoelund.se on 10 Jun 2013 13:34 UTC Seems to compile in Linux here. I don't have any OSX systems available to test on though.

modelica-trac-importer commented 7 years ago

Comment by beutlich on 12 Jun 2013 07:30 UTC It would be very appreciated to have some more feedback on my proposed pacthes from the tables list members. Thank you.

modelica-trac-importer commented 7 years ago

Comment by hansolsson on 12 Jun 2013 12:33 UTC I tried to read through the code and found a few minor issues so far:

Trailing numbers seems to be allowed for the textual-data, e.g.;

#1
double A(3,2)
0 1
1 4
2 4 5

I also noticed that it allows additional separators (apart from space) such as ",;\t". That should not be an issue - but in combination with allowing trailing text it can lead to real problems for someone using 2,3 instead of 2.3. Yes, comments in form of # should be allowed, but not any text.

_free_locale is not called for Visual Studio, see the user comment(!) on http://msdn.microsoft.com/en-us/library/4zx9aht2(v=vs.80).aspx and _free_locale description on: http://msdn.microsoft.com/en-us/library/beadd31b(v=vs.80).aspx I don't know the equivalent call under Linux, I assume (and vaguely remember) that the missing call leads to a memory leak - but have not really verified it.