skeeto / pdjson

C JSON parser library that doesn't suck
The Unlicense
281 stars 36 forks source link

Locale can break floating point parsing #27

Open PerMildner opened 3 years ago

PerMildner commented 3 years ago

https://github.com/skeeto/pdjson/blob/67108d883061043e55d0fb13961ac1b6fc8a485c/pdjson.c#L850

strtod() is affected by the current locale and is therefore unsuitable.

For instance, setting the locale to a locale that uses comma instead of period, will incorrectly treat 123.45 as 123 (stopping at the period, since the period is not part of the floating point number syntax).

On macOS LC_ALL=sv_SE.UTF-8 sets such a locale, the name is different on Linux.

There is no test-case for json_get_number().

Reproducer:

perm@NP pdjson % locale
LANG="en_SE.UTF-8"
LC_COLLATE="en_US.UTF-8"
LC_CTYPE="en_US.UTF-8"
LC_MESSAGES="en_US.UTF-8"
LC_MONETARY="en_US.UTF-8"
LC_NUMERIC="en_US.UTF-8"
LC_TIME="en_US.UTF-8"
LC_ALL="en_US.UTF-8"
perm@NP pdjson % make test
tests/tests
PASS double locale C
PASS double locale en_US.UTF-8
PASS number
PASS true
PASS false
PASS null
PASS string
PASS string quotes
PASS object
PASS array
PASS number stream
PASS mixed stream
PASS empty stream
PASS stream separation
PASS incomplete array
PASS \uXXXX
PASS invalid surrogate pair
PASS invalid surrogate half
PASS surrogate misorder
PASS surrogate pair
20 pass, 0 fail
perm@NP pdjson % LC_ALL=sv_SE.UTF-8 make test
tests/tests
PASS double locale C
FAIL double locale sv_SE.UTF-8
PASS number
PASS true
PASS false
PASS null
PASS string
PASS string quotes
PASS object
PASS array
PASS number stream
PASS mixed stream
PASS empty stream
PASS stream separation
PASS incomplete array
PASS \uXXXX
PASS invalid surrogate pair
PASS invalid surrogate half
PASS surrogate misorder
PASS surrogate pair
19 pass, 1 fail
make: *** [check] Error 1
perm@NP pdjson % git rev-parse HEAD
67108d883061043e55d0fb13961ac1b6fc8a485c
perm@NP pdjson % git diff | cat
diff --git a/tests/tests.c b/tests/tests.c
index 03760fe..d5b289a 100644
--- a/tests/tests.c
+++ b/tests/tests.c
@@ -1,6 +1,8 @@
 #include <stdio.h>
 #include <stdlib.h>
 #include <string.h>
+#include <locale.h>
+
 #include "../pdjson.h"

 #if _WIN32
@@ -113,6 +115,48 @@ main(void)
     int count_pass = 0;
     int count_fail = 0;

+    {
+      size_t i;
+      char const *locales[] =
+   {
+    "C", // Standard POSIX ("ASCII") locale
+    "" // Inherit from environment
+   };
+               
+               
+      for (i = 0; i < 2; i++) {
+   char const *locale = locales[i];
+
+   char const *name = "double";
+   int success = 0;
+   struct json_stream json[1];
+   char const *buf = "123.45";
+   double expect_double = 123.45;
+   char *locale_name = setlocale(LC_ALL, locale);
+
+   json_open_buffer(json, buf, strlen(buf));
+   json_set_streaming(json, false);
+
+   if (JSON_NUMBER == json_next(json)) {
+     double d = json_get_number(json);
+     if (d == expect_double) {
+       success = 1;
+     }
+   }
+   if (success) {
+     count_pass++;
+     printf(C_GREEN("PASS") " %s locale %s\n", name, locale_name);
+   } else {
+     count_fail++;
+     printf(C_RED("FAIL") " %s locale %s\n", name, locale_name);
+   }
+      }
+    }
+
+
+
+
+
     {
         const char str[] = "  1024\n";
         struct expect seq[] = {
perm@NP pdjson % 
boris-kolpackov commented 3 years ago

Just a note that you can get the number as a string and then convert it yourself using a more suitable mechanism (for example, C++17 std::from_chars).

skeeto commented 3 years ago

Yup, this is a fundamental flaw of the C standard library, and there's no solution except for each library to implement their own strtod(). Locale configuration is global state, so it's not safe for libraries to manipulate it, even temporarily. See here:

https://github.com/mpv-player/mpv/commit/1e70e82baa9193f6f027338b0fab0f5078971fbe

As @boris-kolpackov said, you can access the raw string and parse it yourself. That also covers the case where the number being parsed has greater precision than a double, as JSON isn't limited to IEEE floats.