nginx / unit

NGINX Unit - universal web app server - a lightweight and versatile open source server that simplifies the application stack by natively executing application code across eight different programming language runtimes.
https://unit.nginx.org
Apache License 2.0
5.26k stars 322 forks source link

Incorrect parsing of 64-bit integers #1115

Closed alejandro-colomar closed 4 months ago

alejandro-colomar commented 4 months ago

https://github.com/nginx/unit/blob/46cef09f296d9a3d54b98331d25920fc6322bea8/src/nxt_conf.c#L640

double is a 64-bit floating-point type. This means it cannot represent all 64-bit integers precisely. Or in other words, converting an int64_t to a double is a narrowing conversion which loses precision.

So, why this thing?

        case NXT_CONF_MAP_INT32:
        case NXT_CONF_MAP_INT64:
        case NXT_CONF_MAP_INT:
        case NXT_CONF_MAP_SIZE:
        case NXT_CONF_MAP_OFF:
        case NXT_CONF_MAP_MSEC:

            if (v->type != NXT_CONF_VALUE_INTEGER) {
                break;
            }

            num = nxt_strtod(v->u.number, NULL);

            switch (map[i].type) {

            case NXT_CONF_MAP_INT32:
                ptr->i32 = num;
                break;

            case NXT_CONF_MAP_INT64:
                ptr->i64 = num;
                break;

            case NXT_CONF_MAP_INT:
                ptr->i = num;
                break;

            case NXT_CONF_MAP_SIZE:
                ptr->size = num;
                break;

            case NXT_CONF_MAP_OFF:
                ptr->off = num;
                break;

            case NXT_CONF_MAP_MSEC:
                ptr->msec = (nxt_msec_t) num * 1000;
                break;

            default:
                nxt_unreachable();
            }

If you specify a very large integer in the JSON config, it seems it's going to be silently mis-parsed, and that may cause all kinds of problems, depending on how important that number is.

alejandro-colomar commented 4 months ago

Calling strtoimax(3) would be better, unless I'm missing something.

alejandro-colomar commented 4 months ago

The initial parsing seems to be done here:

https://github.com/nginx/unit/blob/46cef09f296d9a3d54b98331d25920fc6322bea8/src/nxt_conf.c#L2195

    nxt_errno = 0;
    end = NULL;

    num = nxt_strtod(value->u.number, &end);

    if (nxt_slow_path(nxt_errno == NXT_ERANGE
        || fabs(num) > (double) NXT_INT64_T_MAX))
    {
        nxt_conf_json_parse_error(error, start,
            "The number is out of representable range.  Such JSON number "
            "value is not supported."
        );

        return NULL;
    }

which at least checks errno, but has the same precision problem.

alejandro-colomar commented 4 months ago

Apart of that, is there any mechanism to avoid other narrowing conversions, as in when parsing 300 as a int8_t? @ac000 , you touched some related code recently for nxt_cred_t; do you remember if there's any checks that prevent 300 from being silently truncated when parsing an int8_t? I haven't seen anything that looks like such a check.

ac000 commented 4 months ago

Hi Alex, interesting topic!

Calling strtoimax(3) would be better, unless I'm missing something.

Hmm, would that be guaranteed to return at least 64bits on all platforms we run on?

Apart of that, is there any mechanism to avoid other narrowing conversions, as in when parsing 300 as a int8_t? @ac000 , you touched some related code recently for nxt_cred_t; do you remember if there's any checks that prevent 300 from being silently truncated when parsing an int8_t? I haven't seen anything that looks like such a check.

Me neither.

If this comment is correct then it's perhaps not that bad, or is at least good enough for our use case.

However I don't think a normal double can really hold an INT64_MAX but a long double should...

Double (no pun intended!) checked if a long double can at least hold a INT64_MAX...

#include <stdio.h>
#include <stdint.h>

int main(void)
{
        long double d = INT64_MAX;
        int64_t i = INT64_MAX;

        printf("%.19LG / %ld\n", d, i);

        if ((long)d == i)
                printf("same\n");

        return 0;
}
$ ./d
9223372036854775807 / 9223372036854775807
same

Perhaps that is the answer, to switch it to a long double?

That would require strtold(3) which is marked as C99... (so is strtoimax(3) incidentally...)

ac000 commented 4 months ago

That would require strtold(3) which is marked as C99... (so is strtoimax(3) incidentally...)

Although GCC 4.8.5 as on CentOS 7 which defaults to -std=gnu90 does recognise both those functions...

Even though the strtold(3) man page says

strtof(), strtold(): _XOPEN_SOURCE >= 600 || _ISOC99_SOURCE || _POSIX_C_SOURCE >= 200112L; or cc -std=c99

it recognises it without seemingly explicitly specifying any of those...

ac000 commented 4 months ago

However I can foresee a lack of desire to do anything about this unless it's causing some actual verified problem...

alejandro-colomar commented 4 months ago

Hi Andrew!

Hi Alex, interesting topic!

:)

I've spent quite some time recently auditing shadow regarding string-to-numeric parsing, and found several bugs in there. I've written a library that will be used in shadow, and hopefully in other projects to simplify usage of strtol(3) et al., and so reduce bugs. Also fixed some bugs in NetBSD's strtoi(3) and strtou(3).

See also:

Calling strtoimax(3) would be better, unless I'm missing something.

Hmm, would that be guaranteed to return at least 64bits on all platforms we run on?

Yes. C11 (and probably C99 as well, but didn't check) guarantees that intmax_t is at least 64-bit wide.

http://port70.net/~nsz/c/c11/n1570.html#7.20.2p2

Its implementation-defined value shall be equal to or greater in
magnitude (absolute value) than the corresponding value given below,

And below: http://port70.net/~nsz/c/c11/n1570.html#7.20.2.5


-  minimum value of greatest-width signed integer type

       INTMAX_MIN                              -(2^63 - 1)

-  maximum value of greatest-width signed integer type

       INTMAX_MAX                              2^63 - 1

-  maximum value of greatest-width unsigned integer type

       UINTMAX_MAX                             2^64 - 1

Apart of that, is there any mechanism to avoid other narrowing conversions, as in when parsing 300 as a int8_t? @ac000 , you touched some related code recently for nxt_cred_t; do you remember if there's any checks that prevent 300 from being silently truncated when parsing an int8_t? I haven't seen anything that looks like such a check.

Me neither.

If this comment is correct then it's perhaps not that bad, or is at least good enough for our use case.

What that comment says is that the error will be relatively small, i.e., maybe you specify 2^50 and strtod(3) would understand 2^50+42 (I completely made up the magnitude of the error; I don't really know if it'll be larger or smaller). Still, if for some case precision matters, one would want the exact number that was written, or at least a warning that the number cannot be parsed precisely, but not a silent conversion.

However I don't think a normal double can really hold an INT64_MAX but a long double should...

Are long doubles portable? Especially, is their size portable? I never used that beast. At least I'm going to bet that if it's wide enough, it's also going to be slow. A double would already be slower than an int64_t, so I wouldn't see a reason at all to call strtod(3), but strtold(3) would go even beyond.

Double (no pun intended!) checked if a long double can at least hold a INT64_MAX...

#include <stdio.h>
#include <stdint.h>

int main(void)
{
        long double d = INT64_MAX;
        int64_t i = INT64_MAX;

        printf("%.19LG / %ld\n", d, i);

        if ((long)d == i)
                printf("same\n");

        return 0;
}
$ ./d
9223372036854775807 / 9223372036854775807
same

But is that portable? Or are some systems allowed to alias long double and double?

Perhaps that is the answer, to switch it to a long double?

I don't see any appealing reason, except reducing the scope of a fix. But to do the right thing, I'd call strtoimax(3).

alejandro-colomar commented 4 months ago

That would require strtold(3) which is marked as C99... (so is strtoimax(3) incidentally...)

Although GCC 4.8.5 as on CentOS 7 which defaults to -std=gnu90 does recognise both those functions...

Even though the strtold(3) man page says

strtof(), strtold(): _XOPEN_SOURCE >= 600 || _ISOC99_SOURCE || _POSIX_C_SOURCE >= 200112L; or cc -std=c99

it recognises it without seemingly explicitly specifying any of those...

I don't remember the details, but maybe GNU C implies some of those macros? Also, what version of the Linux man-pages are you reading? It doesn't match mine.

strtod(3)              Library Functions Manual              strtod(3)

NAME
     strtod,  strtof, strtold - convert ASCII string to floating‐point
     number

LIBRARY
     Standard C library (libc, -lc)

SYNOPSIS
     #include <stdlib.h>

     double strtod(const char *restrict nptr, char **restrict endptr);
     float strtof(const char *restrict nptr, char **restrict endptr);
     long double strtold(const char *restrict nptr, char **restrict endptr);

 Feature    Test     Macro     Requirements     for     glibc     (see
 feature_test_macros(7)):

     strtof(), strtold():
         _ISOC99_SOURCE || _POSIX_C_SOURCE >= 200112L
alejandro-colomar commented 4 months ago

However I can foresee a lack of desire to do anything about this unless it's causing some actual verified problem...

Well, I was having a look at the gzip code, which reads an INT64, and was making sure that the code behaved well for any input, and then saw that code.

It's regarding the configurable "min_length" threshold size for compression. I then checked what the code was doing for very large integers, and saw this. Is it a real-world bug? Maybe not, as in, this threshold isn't something critically precise, and nobody would set it over 2G either, so probably not yet a real world problem. I was just making sure my code works well. :)

alejandro-colomar commented 4 months ago

The problem of parsing 300 as 300-256 seems more important, since it's a lowish number. I'm surprised that nobody has been bitten by it yet. Oh yeah, something like that has actually happened with nxt_cred_t. Maybe it's the time to prevent it happening again somewhere else...

ac000 commented 4 months ago

On Tue, 06 Feb 2024 13:01:05 -0800 Alejandro Colomar @.***> wrote:

The problem of parsing 300 as 300-256 seems more important, since it's a lowish

Is this a real thing you saw?

number. I'm surprised that nobody has been bitten by it yet. Oh yeah, it has actually happened with nxt_cred_t. Maybe it's the time to prevent it happening again somewhere else...

nxt_cred_t is a new thing, so whatever you think happened to it, didn't.

The problem nxt_cred_t was fixing was writing 32bits of memory into a 64bits destination and the upper 32bits were left unchanged.

ac000 commented 4 months ago

On Tue, 06 Feb 2024 12:44:23 -0800 Alejandro Colomar @.***> wrote:

That would require strtold(3) which is marked as C99... (so is strtoimax(3) incidentally...)

Although GCC 4.8.5 as on CentOS 7 which defaults to -std=gnu90 does recognise both those functions...

Even though the strtold(3) man page says

strtof(), strtold(): _XOPEN_SOURCE >= 600 || _ISOC99_SOURCE || _POSIX_C_SOURCE >= 200112L; or cc -std=c99

it recognises it without seemingly explicitly specifying any of those...

I don't remember the details, but maybe GNU C implies some of those macros? Also, what version of the Linux man-pages are you reading? It doesn't match mine.

The man pages from CentOS 7... man-pages-3.53-5.el7.noarch

ac000 commented 4 months ago

On Tue, 06 Feb 2024 12:42:05 -0800 Alejandro Colomar @.***> wrote:

Hi Andrew!

Hi Alex, interesting topic!

:)

I've spent quite some time recently auditing shadow regarding string-to-numeric parsing, and found several bugs in there. I've written a library that will be used in shadow, and hopefully in other projects to simplify usage of strtol(3) et al., and so reduce bugs.

See also:

https://github.com/shadow-maint/shadow/pull/893 https://lists.reproducible-builds.org/pipermail/rb-general/2024-January/003225.html https://www.alejandro-colomar.es/src/alx/alx/lib/liba2i.git/

Calling strtoimax(3) would be better, unless I'm missing something.

Hmm, would that be guaranteed to return at least 64bits on all platforms we run on?

Yes. C11 (and probably C99 as well, but didn't check) guarantees that intmax_t is at least 64-bit wide.

On CentOs 7, gnu90, it also seems to 64bits.

I was more concerned about 32bit archs, maybe the BSD's and exotic things like s390!

ac000 commented 4 months ago

On Tue, 06 Feb 2024 12:42:05 -0800 Alejandro Colomar @.***> wrote:

Are long doubles portable? Especially, is their size portable? I never used that beast. At least I'm going to bet that if it's wide enough, it's also going to be slow. A double would already be slower than an int64_t, so I wouldn't see a reason at all to call strtod(3), but strtold(3) would go even beyond.

From https://en.wikipedia.org/wiki/Long_double

The long double type was present in the original 1989 C standard,[1] but support was improved by the 1999 revision of the C standard, or C99, which extended the standard library to include functions operating on long double such as sinl() and strtold().

ac000 commented 4 months ago

On Tue, 06 Feb 2024 12:42:05 -0800 Alejandro Colomar @.***> wrote:

I don't see any appealing reason, except reducing the scope of a fix. But to do the right thing, I'd call strtoimax(3).

But we do (I assume) need to support decimals...

alejandro-colomar commented 4 months ago

On Tue, 06 Feb 2024 12:42:05 -0800 Alejandro Colomar @.***> wrote: I don't see any appealing reason, except reducing the scope of a fix. But to do the right thing, I'd call strtoimax(3). But we do (I assume) need to support decimals...

Remember the cases in which it's being called. They're all integers (and the validation code is rejecting any decimals somewhere else):

https://github.com/nginx/unit/blob/46cef09f296d9a3d54b98331d25920fc6322bea8/src/nxt_conf.c#L640

double is a 64-bit floating-point type. This means it cannot represent all 64-bit integers precisely. Or in other words, converting an int64_t to a double is a narrowing conversion which loses precision.

So, why this thing?

        case NXT_CONF_MAP_INT32:
        case NXT_CONF_MAP_INT64:
        case NXT_CONF_MAP_INT:
        case NXT_CONF_MAP_SIZE:
        case NXT_CONF_MAP_OFF:
        case NXT_CONF_MAP_MSEC:

            if (v->type != NXT_CONF_VALUE_INTEGER) {
                break;
            }

            num = nxt_strtod(v->u.number, NULL);

            switch (map[i].type) {

            case NXT_CONF_MAP_INT32:
                ptr->i32 = num;
                break;

            case NXT_CONF_MAP_INT64:
                ptr->i64 = num;
                break;

            case NXT_CONF_MAP_INT:
                ptr->i = num;
                break;

            case NXT_CONF_MAP_SIZE:
                ptr->size = num;
                break;

            case NXT_CONF_MAP_OFF:
                ptr->off = num;
                break;

            case NXT_CONF_MAP_MSEC:
                ptr->msec = (nxt_msec_t) num * 1000;
                break;

            default:
                nxt_unreachable();
            }

If you specify a very large integer in the JSON config, it seems it's going to be silently mis-parsed, and that may cause all kinds of problems, depending on how important that number is.

See how decimals are rejected:

alx@debian:~/etc/nginx/unit$ cat unix.json | grep -C2 return
        "a": [{
            "action": {
                "return": 302.2,
                "location": "/i/am/not/at/home/"
            }
alx@debian:~/etc/nginx/unit$ sudo unit ctl http PUT /config <unix.json 
{
    "error": "Invalid configuration.",
    "detail": "The \"return\" value must be an integer number, but not a fractional number."
}

The mechanism by which they are rejected can be found aroung src/nxt_conf.c and src/nxt_conf_validation.c (I remember having seen it last week).

alejandro-colomar commented 4 months ago

On Tue, 06 Feb 2024 12:42:05 -0800 Alejandro Colomar @.***> wrote: Are long doubles portable? Especially, is their size portable? I never used that beast. At least I'm going to bet that if it's wide enough, it's also going to be slow. A double would already be slower than an int64_t, so I wouldn't see a reason at all to call strtod(3), but strtold(3) would go even beyond. From https://en.wikipedia.org/wiki/Long_double The long double type was present in the original 1989 C standard,[1] but support was improved by the 1999 revision of the C standard, or C99, which extended the standard library to include functions operating on long double such as sinl() and strtold().

Hmm, as a curiosity, MS makes long double and double the same thing, per the wikipedia reference you linked to. The good thing is that none of us care about MS. Other systems make it at least an 80-bit type (64 bits for the significand), which makes it wide enough for all int64_t values, in all systems that we care.

callahad commented 4 months ago

If we don't need decimals, and are indeed rejecting them during validation, it sounds like strtoimax(3) is strictly better here.

...it also sounds like the current code is unlikely to cause issues in real-world scenarios, so this is an enhancement, not a defect.

All we need to move toward accepting a patch would be verifying that we do not use this code path for anything that accepts decimals as valid.

Edit: Well, it's not really an enhancement either; not user visible. It's an internal cleanup.

ac000 commented 4 months ago

EDIT: int64_t maybe wants to be intmax_t...

Completely untested, but compiles, so it must be good!

diff --git a/src/nxt_conf.c b/src/nxt_conf.c
index 008cb968..6f4eabfd 100644
--- a/src/nxt_conf.c
+++ b/src/nxt_conf.c
@@ -10,6 +10,7 @@
 #include <nxt_conf.h>

 #include <float.h>
+#include <inttypes.h>
 #include <math.h>

@@ -587,7 +588,7 @@ nxt_int_t
 nxt_conf_map_object(nxt_mp_t *mp, nxt_conf_value_t *value, nxt_conf_map_t *map,
     nxt_uint_t n, void *data)
 {
-    double            num;
+    int64_t           num;
     nxt_str_t         str, *s;
     nxt_uint_t        i;
     nxt_conf_value_t  *v;
@@ -637,7 +638,7 @@ nxt_conf_map_object(nxt_mp_t *mp, nxt_conf_value_t *value, nxt_conf_map_t *map,
                 break;
             }

-            num = nxt_strtod(v->u.number, NULL);
+            num = strtoimax((const char *)v->u.number, NULL, 10);

             switch (map[i].type) {
alejandro-colomar commented 4 months ago

If we don't need decimals, and are indeed rejecting them during validation, it sounds like strtoimax(3) is strictly better here.

...it also sounds like the current code is unlikely to cause issues in real-world scenarios, so this is ~an enhancement,~ not a defect.

All we need to move toward accepting a patch would be verifying that we do not use this code path for anything that accepts decimals as valid.

Edit: Well, it's not really an enhancement either; not user visible. It's an internal cleanup.

I've been testing unit after @ac000 's query of this being just a suspicion of the source code or a thing that can be seen.

On Tue, 06 Feb 2024 13:01:05 -0800 Alejandro Colomar @.***> wrote: The problem of parsing 300 as 300-256 seems more important, since it's a lowish Is this a real thing you saw?

No, I didn't see it. It was just an educated guess from seeing the code. I've tried to reproduce it, and guess what? It's worse than I anticipated. I was thinking of integer narrowing conversions, which are defined to wrap around. But we're narrowing from a floating point to an integer, and I didn't remember that that's a more dangerous thing to do.

When testing, I was a bit puzzled that the results didn't match my expectation. I was using the following config (I've reduced when pasting here; I hope I didn't break it):

$ cat proc.json 
{

    "listeners": {
        "*:80": {
            "pass": "applications/p"
        }
    },

    "applications": {
        "p": {
            "type": "php",
            "processes": 4294967300, /* UINT32_MAX + 5 */
            "root": "/srv/www/php/",
            "script": "index.php"
        }
    }
}

And was expecting to see a smallish number of processes, around 5 or so. But Unit was attempting to spin up INT32_MIN processes (and for some reason it was thinking that INT32_MIN is a huge number, because the log file was going crazy trying to spin up many hundreds of processes until it failed).

Debugging the reason with the code below:

diff --git a/src/nxt_conf.c b/src/nxt_conf.c
index 008cb968..485f7a4c 100644
--- a/src/nxt_conf.c
+++ b/src/nxt_conf.c
@@ -233,7 +233,10 @@ nxt_conf_set_string_dup(nxt_conf_value_t *value, nxt_mp_t *mp,
 double
 nxt_conf_get_number(nxt_conf_value_t *value)
 {
-    return nxt_strtod(value->u.number, NULL);
+    double tmp;
+    tmp = nxt_strtod(value->u.number, NULL);
+fprintf(stderr, "ALX: %s: %s(): %d: %jd\n", __FILE__, __func__, __LINE__, (intmax_t) tmp);
+    return tmp;
 }

@@ -638,11 +641,13 @@ nxt_conf_map_object(nxt_mp_t *mp, nxt_conf_value_t *value, nxt_conf_map_t *map,
             }

             num = nxt_strtod(v->u.number, NULL);
+fprintf(stderr, "ALX: %s: %s(): %d: %jd\n", __FILE__, __func__, __LINE__, (intmax_t) num);

             switch (map[i].type) {

             case NXT_CONF_MAP_INT32:
                 ptr->i32 = num;
+fprintf(stderr, "ALX: %s: %s(): %d: %jd\n", __FILE__, __func__, __LINE__, (intmax_t) ptr->i32);
                 break;

             case NXT_CONF_MAP_INT64:
diff --git a/src/nxt_router.c b/src/nxt_router.c
index 947836c8..c11f37b6 100644
--- a/src/nxt_router.c
+++ b/src/nxt_router.c
@@ -1753,8 +1753,10 @@ nxt_router_conf_create(nxt_task_t *task, nxt_router_temp_conf_t *tmcf,

             nxt_memzero(app_joint, sizeof(nxt_app_joint_t));

+fprintf(stderr, "ALX: %s: %s(): %d: %jd\n", __FILE__, __func__, __LINE__, (intmax_t) apcf.processes);
             ret = nxt_conf_map_object(mp, application, nxt_router_app_conf,
                                       nxt_nitems(nxt_router_app_conf), &apcf);
+fprintf(stderr, "ALX: %s: %s(): %d: %jd\n", __FILE__, __func__, __LINE__, (intmax_t) apcf.processes);
             if (ret != NXT_OK) {
                 nxt_alert(task, "application map error");
                 goto app_fail;
@@ -1792,8 +1794,11 @@ nxt_router_conf_create(nxt_task_t *task, nxt_router_temp_conf_t *tmcf,
             } else {
                 apcf.max_processes = apcf.processes;
                 apcf.spare_processes = apcf.processes;
+fprintf(stderr, "ALX: %s: %s(): %d: %jd\n", __FILE__, __func__, __LINE__, (intmax_t) apcf.processes);
             }

+fprintf(stderr, "ALX: %s: %s(): %d: %jd\n", __FILE__, __func__, __LINE__, (intmax_t) apcf.processes);
+
             if (apcf.targets_value != NULL) {
                 n = nxt_conf_object_members_count(apcf.targets_value);
$ sudo less /opt/local/nginx/unit/master/var/log/unit/unit.log  | grep ALX
ALX: src/nxt_router.c: nxt_router_conf_create(): 1756: 1
ALX: src/nxt_conf.c: nxt_conf_map_object(): 644: 4294967301
ALX: src/nxt_conf.c: nxt_conf_map_object(): 650: -2147483648
ALX: src/nxt_router.c: nxt_router_conf_create(): 1759: 2147483648
ALX: src/nxt_router.c: nxt_router_conf_create(): 1797: 2147483648
ALX: src/nxt_router.c: nxt_router_conf_create(): 1800: 2147483648

showed me that the conversion in

ptr->i32 = num;

is triggering undefined behavior:

http://port70.net/~nsz/c/c11/n1570.html#6.3.1.4p1

When a finite value of real floating type is converted to an integer type other than _Bool, [...]. If the value of the integral part cannot be represented by the integer type, the behavior is undefined.

And so we have some huge value getting converted to INT32_MIN due to UB. No checks. In this specific case, since it tries to spin up more processes than it can, the reconfiguration fails. In other cases, well, the behavior is undefined.

There's no int8 that doesn't represent a boolean in the configuration, so I could only test with int32.

alejandro-colomar commented 4 months ago

EDIT: int64_t maybe wants to be intmax_t...

Completely untested, but compiles, so it must be good!

diff --git a/src/nxt_conf.c b/src/nxt_conf.c
index 008cb968..6f4eabfd 100644
--- a/src/nxt_conf.c
+++ b/src/nxt_conf.c
@@ -10,6 +10,7 @@
 #include <nxt_conf.h>

 #include <float.h>
+#include <inttypes.h>
 #include <math.h>

@@ -587,7 +588,7 @@ nxt_int_t
 nxt_conf_map_object(nxt_mp_t *mp, nxt_conf_value_t *value, nxt_conf_map_t *map,
     nxt_uint_t n, void *data)
 {
-    double            num;
+    int64_t           num;
     nxt_str_t         str, *s;
     nxt_uint_t        i;
     nxt_conf_value_t  *v;
@@ -637,7 +638,7 @@ nxt_conf_map_object(nxt_mp_t *mp, nxt_conf_value_t *value, nxt_conf_map_t *map,
                 break;
             }

-            num = nxt_strtod(v->u.number, NULL);
+            num = strtoimax((const char *)v->u.number, NULL, 10);

             switch (map[i].type) {

Yep, as a starter, that's what I have in mind (but intmax_t, as you said). I'll prepare some commit message, and check some things tomorrow.

alejandro-colomar commented 4 months ago

EDIT: int64_t maybe wants to be intmax_t... Completely untested, but compiles, so it must be good!

diff --git a/src/nxt_conf.c b/src/nxt_conf.c
index 008cb968..6f4eabfd 100644
--- a/src/nxt_conf.c
+++ b/src/nxt_conf.c
@@ -10,6 +10,7 @@
 #include <nxt_conf.h>

 #include <float.h>
+#include <inttypes.h>
 #include <math.h>

@@ -587,7 +588,7 @@ nxt_int_t
 nxt_conf_map_object(nxt_mp_t *mp, nxt_conf_value_t *value, nxt_conf_map_t *map,
     nxt_uint_t n, void *data)
 {
-    double            num;
+    int64_t           num;
     nxt_str_t         str, *s;
     nxt_uint_t        i;
     nxt_conf_value_t  *v;
@@ -637,7 +638,7 @@ nxt_conf_map_object(nxt_mp_t *mp, nxt_conf_value_t *value, nxt_conf_map_t *map,
                 break;
             }

-            num = nxt_strtod(v->u.number, NULL);
+            num = strtoimax((const char *)v->u.number, NULL, 10);

             switch (map[i].type) {

Yep, as a starter, that's what I have in mind (but intmax_t, as you said). I'll prepare some commit message, and check some things tomorrow.

Probably, not enough to avoid UB. This function is used all over the place:

$ grepc nxt_conf_get_number .
./src/nxt_conf.c:double
nxt_conf_get_number(nxt_conf_value_t *value)
{
    return nxt_strtod(value->u.number, NULL);
}
./src/nxt_conf.h:NXT_EXPORT double nxt_conf_get_number(nxt_conf_value_t *value);

So we need to reject numbers out-of-range early in the configuration.

Let me prepare something.

hongzhidao commented 4 months ago

Hi,

See how decimals are rejected:

alx@debian:~/etc/nginx/unit$ cat unix.json | grep -C2 return
        "a": [{
            "action": {
                "return": 302.2,
                "location": "/i/am/not/at/home/"
            }
alx@debian:~/etc/nginx/unit$ sudo unit ctl http PUT /config <unix.json 
{
    "error": "Invalid configuration.",
    "detail": "The \"return\" value must be an integer number, but not a fractional number."
}

The return is declared NXT_CONF_VLDT_INTEGER, I feel it's correct to throw the error if its value is 302.2.

alejandro-colomar commented 4 months ago

Hi!

The return is declared NXT_CONF_VLDT_INTEGER, I feel it's correct to throw the error if its value is 302.2.

Oh, yes, that's correct. I was just mentioning that because Andrew thought we needed decimals, but in that path they are always rejected (correctly).

alejandro-colomar commented 4 months ago

On Tue, 06 Feb 2024 12:42:05 -0800 Alejandro Colomar @.***> wrote: Hi Andrew! > Hi Alex, interesting topic! :) I've spent quite some time recently auditing shadow regarding string-to-numeric parsing, and found several bugs in there. I've written a library that will be used in shadow, and hopefully in other projects to simplify usage of strtol(3) et al., and so reduce bugs. See also: <shadow-maint/shadow#893> https://lists.reproducible-builds.org/pipermail/rb-general/2024-January/003225.html https://www.alejandro-colomar.es/src/alx/alx/lib/liba2i.git/ > > > Calling strtoimax(3) would be better, unless I'm missing something. > > Hmm, would that be guaranteed to return at least 64bits on all platforms we run on? Yes. C11 (and probably C99 as well, but didn't check) guarantees that intmax_t is at least 64-bit wide. On CentOs 7, gnu90, it also seems to 64bits. I was more concerned about 32bit archs, maybe the BSD's and exotic things like s390!

Nah, C99 also guarantees 64-bit. I refuse to check C89; it's dead to me. :)

http://port70.net/~nsz/c/c99/n1256.html#7.18.2.5

ac000 commented 4 months ago

On Tue, 06 Feb 2024 19:44:24 -0800 Alejandro Colomar @.***> wrote:

On CentOs 7, gnu90, it also seems to 64bits. I was more concerned about 32bit archs, maybe the BSD's and exotic things like s390!

Nah, C99 also guarantees 64-bit. I refuse to check C89; it's dead to me. :)

We still need to support it for a little while longer... though IIRC CentOS 7 at least has < 1 year left of life.

callahad commented 4 months ago

This is verging off topic, but we don't have to care about C89 / gnu90, or s390x. Unit is an early project, and our eyes need to be on the future. If backcompat with older distros / toolchains is holding us back, we're free to decide to jettison them.

alejandro-colomar commented 4 months ago

On Tue, Feb 06, 2024 at 09:12:03PM -0800, Andrew Clayton wrote:

We still need to support it for a little while longer... though IIRC CentOS 7 at least has < 1 year left of life.

After checking, C89 has no intmax_t. It was introduced in C99, and has always been >=64 bits. gnu90 provides it as a GNU extension.

-- https://www.alejandro-colomar.es/

ac000 commented 4 months ago

On Wed, 07 Feb 2024 00:36:51 -0800 Dan Callahan @.***> wrote:

This is verging off topic, but we don't have to care about C89 / gnu90, or s390x. Unit is an early project, and our eyes need to be on the future. If backcompat with older distros / toolchains is holding us back, we're free to decide to jettison them.

Great! That's news to me!

Currently we build test (I.e buildbot) on CentOS 7, which has GCC 4.8.5 which defaults to -std=gnu90.

That's the oldest system we build test on.

However looking here https://unit.nginx.org/installation/#installation-precomp-pkgs

Ignoring the architectures we claim to support (which includes S390X), we even claim to support CentOS 6.x, however it seems we actually stopped supporting that a while ago (when it went EOL)...

So the oldest thing we still claim to support and provide official packages for is RHEL 7/CentOS 7, it is EOL'd 2024-06-30. The only potential fly in the ointment is that RHEL7 has an extended support lifecycle[0] which ends two years later on 2026-06-30... hopefully we can ignore that...

[0]: https://en.wikipedia.org/wiki/Red_Hat_Enterprise_Linux#Product_life_cycle