kint-php / kint

Kint - Advanced PHP dumper
https://kint-php.github.io/kint/
MIT License
2.78k stars 292 forks source link

Recursion bug - leaving marker in arrays in certain conditions #203

Closed jnvsor closed 8 years ago

jnvsor commented 8 years ago

Don't know why exactly but:

$a = ["Array 1"];
$b = ["Array 2"];
$a[] = &$b;
$b[] = &$a;

d($a);
d($b);

This shows the marker in $a in the second call.

I'm doing an experimental overhaul of the parser (It's looking good) so I'll probably come across more of these things.

raveren commented 8 years ago

I'm sorry, I don't understand the issue, if the marker is there, this might just be something PHP does weirdly, no surprise there :) Kint just represents what data PHP holds internally...

jnvsor commented 8 years ago

No, it's the marker kint put there to detect recursion - the marker at kintParser::$_marker

raveren commented 8 years ago

Oh god you found the polterbug I've been looking to reproduce for for ages! Thank you, I fixed it, but it will come with a batch of changes later this week!

The really, terribly, scary hard task I have for the past two years is to bring documentation up to date :) I really want to include a notable contributors section to give excellent developers like you due credit and immense thanks for all of your hard work!

jnvsor commented 8 years ago

A more scary note on recursion handling is that kint is doing something that's theoretically impossible to do correctly. For example:

<?php

$a = ["Array 1"];
$b = ["Array 2"];
$a[] = &$b;
$b[] = &$a;
$c = ["Array 1", &$b];

d($a);
d($b);
d($c);

Clearly $c is not referenced by $b, but since kint gets it's arguments through func_get_args (By copy not reference) there is no way to correctly determine recursion in this case.

I think kint should (Like print_r, var_dump and other debug builtins which all take parameters by reference) stop trying to be clever.

To demonstrate why the "Clever" behaviour is probably a bad thing, it didn't take long to find a version that's full on completely wrong output:

<?php
$a = ["Array 1"];
$b = ["Array 2"];
$a[] = &$b;
$b[] = &$a;
$c = [&$b];

d($a);
d($b);
d($c);
raveren commented 8 years ago

There is definitely an edge-case bug in detecting circular references, but it has nothing to do with the way Kint gets its arguments. It's a side effect of a conscious effort to short-circuit

array (2) [
    string (7) "Array 1"
    array (2) [
        string (7) "Array 2"
        array (2) [
            string (7) "Array 1"
            *RECURSION*
        ]
    ]
]

into

array (2) [
    string (7) "Array 1"
    array (2) [
        string (7) "Array 2"
        array (2) *RECURSION*
    ]
]

Former is how var_dump behaves, by the way.

This situation can only be fixed one of two ways: render output the ugly var_dump way or rewrite the parsing engine to keep a record of all objects. Kint, as it works now, detects a reference to a higher level and assumes it's a reference to the currently inspected node's parent, and in the case you illustrated, the child node is a reference to its grandparent.

I just spent four hours in recursion detective mode and had to scrap my progress because the universe no longer makes sense.


As this is an edge-case I am not giving it high priority, but am leaving it open.

jnvsor commented 8 years ago

It does have to do with the way it gets it's arguments.

Because kint gets it's arguments by copy not reference the toplevel element is not the one being referenced but a different identical one...

If kint could take it's params by reference it would be easy, but all the helper functions and Kint::dump would have to be modified to have a large but limited amount of &$arg4, &$arg5, &$arg6 etc so that's unworkable.

The var_dump way as you've pointed out is ugly - but it's correct.

Alternatively you could do a deep strict equals on the toplevel array - where if the first element passed is an array and you ever hit an array identical to it you presume it's recursion - however this too is inaccurate if there really is an identical array being passed like so:

<?php

$a = ["Array 1"];
$b = ["Array 2"];
$a[] = &$b;
$b[] = &$a;
$c = ["Array 1", &$b];

d($a);
d($b);
d($c);
raveren commented 8 years ago

It does have to do with the way it gets it's arguments.

Dude, I just spent four hours (and the past 6 years) working on this, are you seriously doubting I know my shit?

jnvsor commented 8 years ago

I just spent the last weekend plus a few hours rewriting the parser from scratch - not only do I know I'm right, I can prove it with a simple test case:

<?php

$a = ["Array 1"];
$b = ["Array 2", &$a];
$a[] = &$b;

echo dump_copy($a); // Parameter copied
echo dump($a); // Parameter referenced
print_r($a); // Parameter copied (Referenced in php7)
var_dump($a); // Parameter copied

function dump(&$val){
    static $marker = null;
    if ($marker === null)
        $marker = uniqid();

    if (empty($val))
        return '';

    if (isset($val[$marker]))
        return "RECURSION";

    $val[$marker] = true;

    $output = "(\n";

    foreach ($val as $key => &$item) {
        if ($key === $marker) {
            continue;
        }
        if (is_array($item)) {
            $output .= dump($item)."\n";
        }
        else {
            $output .= (string) $item."\n";
        }
    }

    unset($val[$marker]);

    return $output.")\n";
}

function dump_copy($val){
    return dump($val);
}
raveren commented 8 years ago

I don't really see what are you trying to prove here, that references behave differently from values?

I already explained the problem in my previous post, and I'll open a separate issue for it, because the one in OP has been fixed.

kktsvetkov commented 4 years ago

@jnvsor have you considered recursion tracking by reference instead of using markers? In this way, you keep references to all of the arrays in the "recursion register", including all of the nested elements that are also arrays. Encountering the same reference for a second time during parsing will mean that you've hit a recursion.

jnvsor commented 4 years ago

@kktsvetkov This was a v1 bug, and was fixed in the v2 full rewrite.

Encountering the same reference for a second time during parsing will mean that you've hit a recursion.

As far as I'm aware there's no way to tell if two references are the same in PHP, since === will just compare array contents. (We can't access raw pointer values in PHP either)

I doubt recursively comparing array contents (Which is itself a recursive operation) will be anywhere near as performant as it is now either.

Also, if there's a bug in the current implementation (either in core parser or in a plugin), people will notice the strange kint\0... keys in their arrays, but if we did it with references it would manifest as a nigh-untraceable memory leak.

If you know of a way to implement this that works and is performant enough to risk a memory leak, please submit a PR, we can always use more performance.

kktsvetkov commented 4 years ago

If you know of a way to implement this that works and is performant enough to risk a memory leak

@jnvsor Sure. How would you advise me to test this risk of memory leak? I've done basic stuff like the example from the first comment up top and does output the same structure as var_dump(), but that's a rather basic.

recursion_without_marker_and_hashes

kktsvetkov commented 4 years ago

I've also run ParserTest.php and the only error I got was about it missing Parser::$marker which I removed. It does pass the testParseRecursion case.

~/kt.kt/kint-php/🐱 php vendor/bin/phpunit tests/Parser/ParserTest.php 
PHPUnit 7.5.20 by Sebastian Bergmann and contributors.

.E..........................................................      60 / 60 (100%)

Time: 3.49 seconds, Memory: 12.00 MB

There was 1 error:

1) Kint\Test\Parser\ParserTest::testConstruct
ReflectionException: Property Kint\Parser\Parser::$marker does not exist

/Users/polina/kt.kt/kint-php/tests/Parser/ParserTest.php:59

ERRORS!
Tests: 60, Assertions: 181, Errors: 1.
kktsvetkov commented 4 years ago

As far as I'm aware there's no way to tell if two references are the same in PHP, since === will just compare array contents.

You are correct, === compares contents only and will produce false positives when arrays with the same contents are met again. Perhaps the arrays can be tainted with the markers only when such a match is found in order to detect if the two arrays are copies of one another, OR one is a reference of the other.

Also, I think that there can be fewer arrays tracked for recursion since if all the array elements are scalar (or resources), there is no way for recursion to occur: you actually need "nestable" data types such as arrays and objects to have a recursion.

I'll amend https://github.com/kint-php/kint/pull/343 later today to introduce these changes (and probably change the PR title since there will be markers and hashes, but just at a different place)

kktsvetkov commented 4 years ago

@jnvsor I was wondering symfony/var-dumper does this, and it seems they do not do it at all - instead, they are relying on the max depth to prevent going infinitely deeper:

if (++$cursor->hashIndex === $this->maxItemsPerDepth || $cursor->stop) { ...

https://github.com/symfony/var-dumper/blob/master/Cloner/Data.php

If I read this right, the $cursor->hashIndex track how deep is your dump, and after reaching some limit, it just stops.

This is an interesting alternative, and perhaps something that can be considered: if the max_depth is set, we can skip recursion detection altogether. The potential downside is that in some cases like dumping $GLOBALS the parser will dive into globals own reference for max_depth levels deep :)

jnvsor commented 4 years ago

instead, they are relying on the max depth to prevent going infinitely deeper

Symfony's var-dumper doesn't have a max depth. $this->maxItemsPerDepth serves a similar purpose to kint's new ArrayLimitPlugin (Which I totally ripped off of var-dumper :P)

var-dumper dumps breadth first and drops any duplicates with an anchor link to the first (ie. highest found) copy of that variable.

I find this somewhat confusing: Duplicates in different trees will be dumped twice but duplicates that are children of any ancestor will be anchored. Also if the dump is at the end of the page and the anchor doesn't scroll properly and you are left guessing what this is a reference to.

(This is an educated guess mind you, var-dumper's source is even more impenetrable than kint and I'm no expert on it)

We dump depth first and don't try to elide duplicate values unless it's recursion.

The potential downside is that in some cases like dumping $GLOBALS the parser will dive into globals own reference for max_depth levels deep

It's worse than that for kint's depth-first approach: It will dive into all the global variables on each depth along with all their children. You're looking at a lot of extra work there.