JS_ToBigInt64 handles overflow incorrectly

Icemic commented 5 months ago

JS_ToBigInt64 handles number larger than 2^64 incorrectly.

Divive into the function, it calls JS_ToBigInt64Free then JS_ToBigInt64Free:

static int JS_ToBigInt64Free(JSContext *ctx, int64_t *pres, JSValue val)
{
    bf_t a_s, *a;

    a = JS_ToBigIntFree(ctx, &a_s, val);
    if (!a) {
        *pres = 0;
        return -1;
    }
    bf_get_int64(pres, a, BF_GET_INT_MOD);
    JS_FreeBigInt(ctx, a, &a_s);
    return 0;
}

For numbers larger than 2^64, a has a valid value so it goes to bf_get_int64 where the problem occurs.

/* The rounding mode is always BF_RNDZ. Return BF_ST_INVALID_OP if there
   is an overflow and 0 otherwise. */
int bf_get_int64(int64_t *pres, const bf_t *a, int flags)
{
    uint64_t v;
    int ret;
    if (a->expn >= BF_EXP_INF) {
        ret = BF_ST_INVALID_OP;
        if (flags & BF_GET_INT_MOD) {
            v = 0;
        } else if (a->expn == BF_EXP_INF) {
            v = (uint64_t)INT64_MAX + a->sign;
        } else {
            v = INT64_MAX;
        }
    } else if (a->expn <= 0) {
        v = 0;
        ret = 0;
    } else if (a->expn <= 63) {
#if LIMB_BITS == 32
        if (a->expn <= 32)
            v = a->tab[a->len - 1] >> (LIMB_BITS - a->expn);
        else
            v = (((uint64_t)a->tab[a->len - 1] << 32) |
                 get_limbz(a, a->len - 2)) >> (64 - a->expn);
#else
        v = a->tab[a->len - 1] >> (LIMB_BITS - a->expn);
#endif
        if (a->sign)
            v = -v;
        ret = 0;
    } else if (!(flags & BF_GET_INT_MOD)) {
        ret = BF_ST_INVALID_OP;
        if (a->sign) {
            uint64_t v1;
            v = (uint64_t)INT64_MAX + 1;
            if (a->expn == 64) {
                v1 = a->tab[a->len - 1];
#if LIMB_BITS == 32
                v1 = (v1 << 32) | get_limbz(a, a->len - 2);
#endif
                if (v1 == v)
                    ret = 0;
            }
        } else {
            v = INT64_MAX;
        }
    } else {
        slimb_t bit_pos = a->len * LIMB_BITS - a->expn;
        v = get_bits(a->tab, a->len, bit_pos);
#if LIMB_BITS == 32
        v |= (uint64_t)get_bits(a->tab, a->len, bit_pos + 32) << 32;
#endif
        if (a->sign)
            v = -v;
        ret = 0;
    }
    *pres = v;
    return ret;
}

For example, the number 2^64 + 2, its a->expn is 64. While using BF_GET_INT_MOD flag, it goes to the final else block. As BF_GET_INT_MOD implies that modulo 2^n instead of saturation. NaN and infinity return 0, finally we got the v as 2 which is from (2^64 + 2) % 2^64, and ret=0 which means returns a right result. Then everything mess up.

That is to say that we cannot judge the return value from JS_ToBigInt64Free (as well as JS_ToBigInt64) whether it is the actual result or the actual result mods 2^64

It seems to be a problem that has been around for a long time, and a solution has already been suggested: https://github.com/theduke/quickjs-rs/blob/master/libquickjs-sys/embed/patches/js-tobigint64-overflow.patch

I've tried to apply the patch and the problem goes fixed. So I think maybe it should be adopted?

saghul commented 5 months ago

Interesting! I guess there is no test that checks for this.

A good first step would be to add one that fails.

Icemic commented 5 months ago

Here is a Rust example: https://github.com/Icemic/quickjspp-rs/blob/221fff9e177fa5b490bf349aa74f3550eb7f0397/tests/runtime.rs#L641

create a quickjs context
just evals 9223372036854775807n + 11n, and get the result ret of type JSValue
try to get value from ret via JS_ToBigInt64 (will get 10 now)

chqrlie commented 5 months ago

Hello @Icemic, nice to see QuickJS used in the rust world...

I don't understand your point: JS_ToBigInt64Free is defined as computing BigInt value modulo 64-bit modulo.

bf_get_int64 never fails with BF_GET_INT_MOD

It follows the ECMA spec: 7.1.15 ToBigInt64 ( argument ).

You seem to expect some other behavior, what is your use case ?

Icemic commented 5 months ago

Thanks for the reply! I hadn't noticed the ECMA spec aspect before. Given that this function adheres to an established specification, it should indeed be fine.

I'm maintaining a QuickJS to Rust binding, which is forked from quickjs-rs. There's a piece of code designed to extract the actual value from a JSValue tagged as bigint, and then convert it into a Rust type: https://github.com/Icemic/quickjspp-rs/blob/221fff9e177fa5b490bf349aa74f3550eb7f0397/src/value/value.rs#L276-L306

Its approach begins with an attempt to convert using JS_ToBigInt64. If that fails, it then resorts to a string conversion. The intention behind using JS_ToBigInt64 was presumably to serve as a performance optimization. However, it seems both the original author and I have misunderstood the true definition of this function.

So, my question now shifts to: In similar binding libraries or FFI scenarios, what is the recommended way to retrieve and convert values from a TAG_BIGINIT type JSValue to a native type?

chqrlie commented 5 months ago

So, my question now shifts to: In similar binding libraries or FFI scenarios, what is the recommended way to retrieve and convert values from a JS_TAG_BIGINT type JSValue to a native type?

I would recommend adding these APIs:

int JS_IsInt64(JSContext *ctx, JSValue obj): return 1 iff the value is an integer and in the range INT64_MIN to INT64_MAX.
int JS_IsUint64(JSContext *ctx, JSValue obj): return 1 iff the value fits is an integer and in the range 0 to UINT64_MAX.

With these your can determine if JS_ToBigInt64 will give you the expected result. If the number is not in the proper range, you can use the string conversion, but I recommend using a base 16 conversion that is much faster than a base 10 conversion.

For efficient conversions to and from another language native representation, I would add these:

int JS_GetValueSign(JSContext *ctx, JSValue obj): returns 0 for positive or zero, 1 for negative, -1 for NaN and values that are not numerical.
JSValue JS_NegateFree(JSContext *ctx, JSValue obj): return the opposite of a numerical value (value is freed).
int JS_GetBigIntBitLength(JSContext *ctx, JSValue obj): returns the number of significant bits, eg: 0 for 0n, 1 for 1n and -1n, 64 for UINT64_MAX, etc.
int JS_GetBigIntBits(JSContext *ctx, JSValue obj, void *dest, size_t size, size_t count, size_t offset): get the bits by groups of 1, 2, 4 or 8 bytes (a word) into an array dest of such words (uint8_t, uint16_t, uint32_t, uint64_t respectively), from a given offset in the same unit and from least significant to most significant word. Negative values would be retrieved as 2's complement.
JSValue JS_NewBigIntFromBits(JSContext *ctx, const void *src, size_t size, size_t count, int sign): create a JS_TAG_BIGINT value from the array word provided with sign sign. Negative values should be represented in 2's complement.

The current implementation uses a general floating point representation that makes these conversion APIs non trivial but still much faster than a string conversion. We are likely to change the internal representation in the near future to make this API straightforward.

quickjs-ng / quickjs

JS_ToBigInt64 handles overflow incorrectly #364