php / php-src

The PHP Interpreter
https://www.php.net
Other
37.68k stars 7.71k forks source link

xxh3 hash ignores seed if string? #10305

Open lolli42 opened 1 year ago

lolli42 commented 1 year ago

Description

The following code:

<?php
var_dump(hash('xxh3', 'foo'));
# same hash?
var_dump(hash('xxh3', 'foo', options: ['seed' => 'mySeed']));
# different hash!
var_dump(hash('xxh3', 'foo', options: ['seed' => 42]));

Resulted in this output:

string(16) "ab6e5f64077e7d8a"
string(16) "ab6e5f64077e7d8a"
string(16) "d33da54e20ebf99e"

I'd expect the second hash to be different from the first one, just like the third hash. It seems a string as 'seed' is ignored. In case this is intended, it might be good to document this, or maybe except on string non-int input?

PHP Version

Tested with PHP 8.1.13 & 8.2.1

damianwadley commented 1 year ago

There's very little documentation for many of the hash algorithms, but XXH3 accepts a numeric seed or a string secret of sufficient length (apparently >=136 bytes). https://github.com/php/php-src/blob/PHP-8.2.1/ext/hash/hash_xxhash.c#L157

It makes sense to me that PHP should complain if either is provided incorrectly, yet right now it silently ignores a non-integer seed.

cmb69 commented 1 year ago

It seems to me we should use zval_get_long() instead of actually requiring an IS_LONG value.

And while not directly related to this ticket, we also should review the convert_to_string() call for secret. It apparently is not properly prepared to handle exceptions from ::__toString().

nielsdos commented 1 year ago

If no one is working on fixing this yet, I'd like to give this a shot later today :) (both this issue and the toString exception)

nielsdos commented 1 year ago

I worked on this, and used zval_get_long and checked for exceptions (see below). Now numeric strings are handled properly. The remaining question is how to handle non-numeric strings for seed like in OP's example. Currently non-numeric strings are zero, but is this appropriate? XXH3 really wants numbers as a seed according to https://hexdocs.pm/xxh3/0.2.0/xxh3.html.

Additional test here for testing strings:

var_dump(hash('xxh3', 'foo', options: ['seed' => '1'])); // 54557a2c8b633298
var_dump(hash('xxh3', 'foo', options: ['seed' => 1])); // 54557a2c8b633298

class StringableThrowingClass {
    public function __toString(): string {
        throw new Exception('exception in __toString');
        return '';
    }
}

$testString = str_repeat('a', 136);

class StringableClass {
    public function __toString(): string {
        global $testString;
        return $testString;
    }
}

try {
    var_dump(hash('xxh3', 'foo', options: ['secret' => new StringableThrowingClass]));
} catch (Exception $e) {
    echo $e->getMessage() . "\n";
}
var_dump(hash('xxh3', 'foo', options: ['secret' => new StringableClass]));
var_dump(hash('xxh3', 'foo', options: ['secret' => $testString]));

Current changes/patch:

diff --git a/ext/hash/hash_xxhash.c b/ext/hash/hash_xxhash.c
index 7ecedd8128..36540e91b9 100644
--- a/ext/hash/hash_xxhash.c
+++ b/ext/hash/hash_xxhash.c
@@ -168,13 +168,16 @@ zend_always_inline static void _PHP_XXH3_Init(PHP_XXH3_64_CTX *ctx, HashTable *a
                        return;
                }

-               if (_seed && IS_LONG == Z_TYPE_P(_seed)) {
+               if (_seed) {
+                       zend_long seed_as_long = zval_get_long(_seed);
                        /* This might be a bit too restrictive, but thinking that a seed might be set
                                once and for all, it should be done a clean way. */
-                       func_init_seed(&ctx->s, (XXH64_hash_t)Z_LVAL_P(_seed));
+                       func_init_seed(&ctx->s, (XXH64_hash_t)seed_as_long);
                        return;
                } else if (_secret) {
-                       convert_to_string(_secret);
+                       if (!try_convert_to_string(_secret)) {
+                               return;
+                       }
                        size_t len = Z_STRLEN_P(_secret);
                        if (len < PHP_XXH3_SECRET_SIZE_MIN) {
                                zend_throw_error(NULL, "%s: Secret length must be >= %u bytes, %zu bytes passed", algo_name, XXH3_SECRET_SIZE_MIN, len);
weltling commented 1 year ago

While involving zval_get_long sounds good, it still should be a sane value. The hash is not cryptographic, but if a seed value is garbage it can possibly cause harm. Say like in this case, mySeed will evaluate to 0 which needs to be warned. Same also, say 1abc will have same effect as '1'. Same if it evaluates to negative, or is a negative - passing that is perhaps not wrong as internally it will wrap, but it's a non obvious behavior. IMO in general any non numeric string should be considered invalid. While on it, checking more on sanity would be not a waste.

Thansk

nielsdos commented 1 year ago

I'm going to PR the string exception fix later today, because that can stand on its own as a commit and shouldn't impact BC.

As for the string seed: I think it might be a bad idea after all to use zval_get_long because of the implicit behaviour with strings like "1abc" and non-numeric strings (like the concern weltling raised in the previoud comment). Perhaps it is best to be strict about this and only allow LONGs (or floats that are actually longs), and if the input is any other type we could throw. Also interesting is that I found that murmurhash also implicitly ignores its option if it is not a long, so that needs changing as well... I didn't check the other hash functions though. In short, I would like to propose to let those functions throw if a non-long arg is passed. That solution however technically breaks BC and thus should go into 8.3... what do y'all think? :)

Girgias commented 1 year ago

I'm going to PR the string exception fix later today, because that can stand on its own as a commit and shouldn't impact BC.

As for the string seed: I think it might be a bad idea after all to use zval_get_long because of the implicit behaviour with strings like "1abc" and non-numeric strings (like the concern weltling raised in the previoud comment). Perhaps it is best to be strict about this and only allow LONGs (or floats that are actually longs), and if the input is any other type we could throw. Also interesting is that I found that murmurhash also implicitly ignores its option if it is not a long, so that needs changing as well... I didn't check the other hash functions though. In short, I would like to propose to let those functions throw if a non-long arg is passed. That solution however technically breaks BC and thus should go into 8.3... what do y'all think? :)

This may or may not need a deprecation and could be added to the mass PHP 8.3 deprecation RFC, but at least having an implementation to look at the impact is a good idea.

nielsdos commented 1 year ago

This may or may not need a deprecation and could be added to the mass PHP 8.3 deprecation RFC, but at least having an implementation to look at the impact is a good idea.

I agree with this, adding the current behaviour to the deprecation RFC would be a good idea. I'm however unsure how to proceed from here. I was thinking of creating two PRs: 1) A PR targeting master that adds a deprecation warning for the current behaviour. 2) A (draft) PR targeting master (but for PHP>8.3) where the new behaviour is introduced.

In that way, we can see the impact on current and on future versions of PHP. Does that sound good to you?

lolli42 commented 5 months ago

I hope its ok to ping here. Same result with script from initial post and PHP 8.3.3.

nielsdos commented 5 months ago

I hope its ok to ping here. Same result with script from initial post and PHP 8.3.3.

Of course it's okay to ping.

The consensus was that we can't make it throw without having it deprecated first. So I added it to the (in-draft) 8.4 deprecation rfc as a todo: https://wiki.php.net/rfc/deprecations_php_8_4#non-numeric_seed_strings_in_xxh3

lolli42 commented 3 months ago

Note the same issue exists with xxh128 (the 128 bit variant of 64 bit xxh3), and maybe other hash algos of that family?

nielsdos commented 3 months ago

Note the same issue exists with xxh128 (the 128 bit variant of 64 bit xxh3), and maybe other hash algos of that family?

You're correct, I changed the stub in the RFC.