php-ds / ext-ds

An extension providing efficient data structures for PHP 7
https://medium.com/p/9dda7af674cd
MIT License
2.11k stars 95 forks source link

PHP 7 - Wrong Vector/Map/Set equality #67

Open tyx opened 7 years ago

tyx commented 7 years ago

I think creating a different issue from #65 is a better idea.

Current problem

var_dump(new \Ds\Set([1, 3]) == new \Ds\Set([1, 2, 3])); // true

it makes me very sad as I definitively can't rely on Ds in my tests. And I use Ds everywhere as it does a really nice job (thanks for that anyway)

Solution

Is there any reason why this behavior ? And is it a bug or a known behavior ? Looks too big that anyone complain about it ^^

Whatever I'm definitively open to contribute event if my C skill are very old because it cannot stay like that.

I already had a look on the (clean) code. I guess we need to implements the compare_objects handlers on the internal htable ? Looks pretty obvious, I know. May be we can even use the toArray and rely on array::compare_objects ? Is there any stuff that prevent us to do that ?

Anyway, let me know what I can do to help on this very annoying issue.

Thanks

rtheunissen commented 7 years ago

@tyx we would need to implement compare_objects on the zend objects, ie. php_ds_set etc. The internal htable would do the actual work though, eg.:

I'm guessing we'd iterate through both the table's keys and values and check if they're equal? (Or in the case of a Set, only the keys as all the values would be undefined).

toArray would be a performance hit, might also run into non-scalar keys.

Sorry about this annoying issue - keen to get it implemented and tested though.

There's also the obvious win if the object being compared to is the same instance. ⚡️

tyx commented 7 years ago

Looks nice ! Thanks you for your fast answer.

toArray would be a performance hit, might also run into non-scalar keys.

Indeed !

Sorry about this annoying issue - keen to get it implemented and tested though.

Compared to the benefits we get to use your extension, it's not a big deal in fact ;)

There's also the obvious win if the object being compared to is the same instance.

Yes ! This one works actually

rtheunissen commented 7 years ago

@nikic would this actually be implicit if we implement the get_properties handler?

nikic commented 7 years ago

@rtheunissen Yes-ish. I'd assume you wouldn't want to have the same comparison semantics as a property comparison would give you. In particular, those would be weak and I doubt you'd want to have Set(["foo"]) == Set([true]), right?

rtheunissen commented 7 years ago

The only issue here is that we can't implement this in the polyfill. :no_good_man:

rtheunissen commented 7 years ago

Only option would be to implement Hashable and use equals for equality checks. Would never need > and < anyway.

shouze commented 7 years ago

@rtheunissen ok so the main issue is equality so it's not a big issue about < and >. And yes why not with Hashable, we're ok that Set, Vector, ... don't implement it ATM? This is what you've planned to solve the issue.

rtheunissen commented 7 years ago

@shouze happy to do it with equals. Added to 2.0, see #65

rtheunissen commented 7 years ago

This is actually more difficult than I expected. equals as part of the Hashable interface sort of scopes that equality check to a context in which the object is used as a key in a hash table. I'm not sure if that also implies general equality, ie. a case where you'd check $a === $b.

For example, if we implement equals on Collection, and we're checking each item of the collection for strict equality, do we call equals on values that implement Hashable?

Or should we create an Equality interface and have Hashable just extend that?

shouze commented 7 years ago

First of all, to me identity comparison (===) is something else and not part of the scope of this issue.

For example, if we implement equals on Collection, and we're checking each item of the collection for strict equality, do we call equals on values that implement Hashable?

Yes why not.

Or should we create an Equality interface and have Hashable just extend that?

And yes why not, I'm not sure that it's needed but could be a nice to have.

In fact I wonder about the way you want to proceed:

  1. do you want to lookup each entries of eg a Collection when we want to check that a collection is equal to another?
  2. Or do you plan to maintain an always up2date internal hash, updated each time we mutate the collection?
rtheunissen commented 7 years ago

Do you want to lookup each entries of eg a Collection when we want to check that a collection is equal to another?

Yes I think so. Something similar to what lodash does with _.equals. We can fail fast with things like type and size comparisons. An interesting case would be a Map, which may not have the same ordering, but could be considered "equal". {a: 1, b: 2} == {b: 2, a: 1} for example. We would have to define this for each collection.

Or do you plan to maintain an always up2date internal hash, updated each time we mutate the collection?

No I don't think this is practical, better to just iterate through to compare.

rtheunissen commented 7 years ago

The issue remains that we can't implement a == check in the polyfill. Whenever you would use $set == $another, you would have to use $set->equals($another).

shouze commented 7 years ago

@rtheunissen yes ok for $set->equals($another) if it can help to maintain the polyfill at the same level of features alongside.

rtheunissen commented 7 years ago

I'd like to keep them consistent, otherwise it invalidates the polyfill to some extent. Maybe if and when the extension becomes a default, we can drop the polyfill.

shouze commented 7 years ago

Yes, can be a scheduled deprecation but you're right it's probably not the time to do that as it would break adoption.

rtheunissen commented 7 years ago

This change is actually quite complex. Whenever values are compared (filter, sort, find, etc), we now have to check if the value implements Hashable, and use its Hashable::equals if it does. Of course it's easy to write a helper like valuesAreEqual that checks this for us, but the effects are clearly outside of the Hashable domain. I'm therefore convinced that we should move the equals method of Hashable to its own interface, maybe Equality, with only a single method equals.

It's a shame that PHP still doesn't have a Comparable interface or operator overloading. I'm tempted to wait for the Comparable RFC to pass and be released, which would make the Equality interface redundant.

You can use something like $a->diff($b)->isEmpty() but that wouldn't currently call equals on any objects in the set that implement Hashable. If you know you only have scalar values in your set, you can wrap it and implement your own equals for now.

marcospassos commented 6 years ago

We're are considering to replace our custom collection structures by DS, but we use collection comparisons heavily.

Java defines specific strategies for computing hash codes for each type of collection. For maps, for instance, the hash code is order insensitive (hash += hash(key) ^ hash(value)), so that the equality relation is valid across all map implementations.

Our collection library has an Equality interface, from which Hashable extends from. It makes sense for us since the concept of hash only makes sense tied to an equality relation. Defining a concept of equality relation between collections also requires defining how the hash codes should be computed consistently as per collection type (it should be part of the interface contract). In Java, for instance, two maps are equal if both have the same EntrySet, while two sets are equal if both have the same elements. The rule for determining if two objects are equals is as follows:

All methods should follow these rules, including diff, intersect, remove, etc.

Despite the issues related to the implementation of operation comparison overloading, it would be really helpful to have an equals method that allows us testing collections for equality. It is not a BC break and does not require complex changes as implementing operation overloading does.

About the Comparable RFC, it's an old discussion and I don't believe it will be added to the core anytime soon.

rtheunissen commented 6 years ago

It's is not a BC break and does not require complex changes as implementing operation overloading does.

That's true, but changing the behaviour of diff, intersect, remove, etc could be considered significant enough to add as part of 2.0 (end 2017).

I agree that the collections should implement Hashable, and I think following the direction you've laid out makes sense. No need for the overloaded op just yet, that can come later.

marcospassos commented 6 years ago

@rtheunissen thanks for considering it! Indeed, changing the behavior of these methods does represent a BC break, but introducing the equals method does not. Any chance of having it before the 2.0 release? Looking the issues, it is probably the most requested feature.

rtheunissen commented 6 years ago

So we're implementing an equals method, but not Hashable yet?

marcospassos commented 6 years ago

Why not Hashable? We can just leave diff, intersect, remove untouched until 2.0. For now, it should be possible to implement equals and hash without introducing a BC break, right?

rtheunissen commented 6 years ago

But then Map and Set will try to use them as Hashable's, and not spl_object_hash, which could break things. We can add an equals method that defines equality, but implementing Hashable is a 2.0 addition.

marcospassos commented 6 years ago

Right, I missed it. Yes, equals will help a lot for now. Just to be sure, how do you plan to implement this algorithm? Performance is always a problem in collection comparisons. What I've in mind is something like the following algorithm, suitable for the polyfill, but obviously written in C:

public function equals($other): bool
{
    if (!($other instanceof Map)){
        return false;
    }

    if ($this->count() !== $other->count()) {
        return false;
    }

    foreach ($this->map as $key => $value) {
        if (!$other->hasKey($key)) {
            return false;
        }

        /**
         * isEqual()
         * - Two identical objects are always equal
         * - Two Hashable objects $a and $b, are equal if $a->equals($b) and $b->equals($a)
         * - Everything else is unequal
         */

        if (!$this->isEqual($value, $other->get($key))) {
            return false;
        }
    }

    return true;
}
rtheunissen commented 6 years ago

So {a: 1, b:2} === {b:2, a: 1} ? I'm considering making the comparison order-sensitive, because it's an ordered collection, unlike a Java Map (excl ordered map).

If ordering is considered, we can do a simple O(n) comparison from 0 to n - 1. If ordering is not considered, then your example is spot on, O(n log n).

marcospassos commented 6 years ago

From Java docs:

Compares the specified object with this map for equality. Returns true if the given object is also a map and the two maps represent the same mappings. More formally, two maps m1 and m2 represent the same mappings if m1.entrySet().equals(m2.entrySet()). This ensures that the equals method works properly across different implementations of the Map interface.

So, in Java ordered and unordered collections and considered equal.

However, I like your idea. We could further improve the collection by introducing a new methodcontainsAll(Map $map), so that $left->count() === $right->count() && $left->containsAll($right) is the same of checking for order insensitive equality. In that way we provide an alternative for performing both checks.

rtheunissen commented 6 years ago

What if we added a bool arg to xor, diff and intersect to also match on the value. That way, to determine if a Map is equal you can check if count($map1->xor($map2, true)) == 0 ? I think that works?

rtheunissen commented 6 years ago

Can always have a bool arg in equals, which won't break the Hashable contract. bool $ordered = false? I really don't like boolean arguments though because their function is not always clear from the caller's end.

rtheunissen commented 6 years ago

I think we should keep it simple here. Check all pairs and make the same ordering a requirement for equality. If you have a potentially different order, sort first then check check equality.

marcospassos commented 6 years ago

I think it can quickly become unmaintainable. So, the behavior of these flags set to false is the same as not implementing the Hashable interface. In other words, if an object implements the Hashable interface it is explicitly defining the standard way for comparing two instances of that type. I think that we should enforce adherence to the protocol. Furthermore, if one wants this behavior and there is no big perfomance gain between the native array and the map, there is no reason for using the map at all.

rtheunissen commented 6 years ago

Java's argument for not considering ordering a part of the criteria for equality is based on the fact that their collections aren't ordered in the first place (bar a few). What's interesting to note is that Java doesn't consider ordering even when using collections like LinkedMap. See http://grepcode.com/file/repository.grepcode.com/java/root/jdk/openjdk/8u40-b25/java/util/LinkedHashMap.java

marcospassos commented 6 years ago

Yes, as I previously noticed:

So, in Java ordered and unordered collections and considered equal.

Another point to consider is how to compare arrays. This is my suggestion:

class Foo implements Hashable {
    private $name;
    public function equals($b) {
         return $b instanceof Hashable && $this->name === $b->name;
    }
}

$left = new Map();
$left->set(0, [new Foo('a')]);

$right = new Map();
$right->set(0, [new Foo('a')]);

assertTrue($left->equals($right));
rtheunissen commented 6 years ago

What does the PHP array do? https://3v4l.org/bbTb0

Order is important when considering equality of an ordered collection. Java is wrong here IMO.

rtheunissen commented 6 years ago

Another point to consider is how to compare arrays

Oh no.. we may have a problem here. If it's an array we'll need to traverse it? Recursively? See that's where this gets very messy where we don't have a native compareTo.

We can use the internal comparison handlers but we can't replicate that in the polyfill, which takes us back to square one in some respects. The array comparison isn't aware of equals.

marcospassos commented 6 years ago

But it brings inconsistency. Is there any way of including the Hashable::equals as part of the === comparison operator? If so, it could be achieved transparently.

rtheunissen commented 6 years ago

I know that you can at least make it work with ==, but I'm not sure about ===. We'd need to diverge from the polyfill to support that though. === should be reserved for "same instance / reference" checks, so really all we need is the ==.

marcospassos commented 6 years ago

== makes much more sense, indeed. The good part about that is that we don't need to care about comparing array, it would be done indirectly.

rtheunissen commented 6 years ago

Exactly, no need to implement equals at all. The issue lies in the polyfill, which we're considering to abandon for 2.0

rtheunissen commented 6 years ago

If we implement == in 1.x, we have an inconsistency with the polyfill.

marcospassos commented 6 years ago

I don't know how hard is it, but if it was possible to implement for DateTime it should not be hard to implement against an interface (or at least I hope so :)

marcospassos commented 6 years ago

Constraining the roadmap according to a polyfill is a big mistake IMHO. The polyfill makes no sense for production use. For any other purpose, I believe that an IDE plugin or just stubs are the way to go.

marcospassos commented 6 years ago

We have another problem. If we use the equality operator, then type juggling will break our plans. Furthermore, the operator == is order insensitive in regards to arrays. So, is there any way to reuse the strict comparison mechanism to compare arrays considering Hashable? It would make the operator itself untouched, but it could be used by the equalsmethod.

rtheunissen commented 6 years ago

Furthermore, the operator == is order insensitive in regards to arrays

~Not true! https://3v4l.org/bbTb0~ I was wrong here, that test was misleading because [0, 1] and [1, 0] might be the same values but their keys are 0 and 1, and therefore not equal pairs. If you were to try [0 => 0, 1 => 1] and [1 => 1, 0 => 0] you'll see that order doesn't matter.

rtheunissen commented 6 years ago

type juggling will break our plans

How so?

marcospassos commented 6 years ago

From the docs:

Example Name Result
$a + $b Union Union of $a and $b.
$a == $b Equality TRUE if $a and $b have the same key/value pairs.
$a === $b Identity TRUE if $a and $b have the same key/value pairs in the same order and of the same types.
$a != $b Inequality TRUE if $a is not equal to $b.
$a <> $b Inequality TRUE if $a is not equal to $b.
$a !== $b Non-identity TRUE if $a is not identical to $b.

http://php.net/manual/en/language.operators.array.php

type juggling will break our plans How so?

[1, true, false] == ["1", 1, 0]

rtheunissen commented 6 years ago

Ahh... well don't use arrays then. ;)

marcospassos commented 6 years ago

It's not a choice. There is no way to constrain it in an application.

rtheunissen commented 6 years ago

I stand corrected, "the operator == is order insensitive in regards to arrays" https://3v4l.org/jdIBv

marcospassos commented 6 years ago

Yeah, maybe the behavior changed in PHP 7 but the doc is outdated. Anyway, the major issue here is the type juggling.

So, is there any way to reuse the strict comparison mechanism to compare arrays considering Hashable? It would make the operator itself untouched, but it could be used by the equalsmethod.

?

rtheunissen commented 6 years ago

I think we should try to stay as close to arrays as possible. 😞

marcospassos commented 6 years ago

I personally don't use the operator ==. I try to avoid it as much as possible.

I think we should try to stay as close to arrays as possible. 😞

Stay close to array implies that order matters. About the equality relation, just close the eyes for arrays is a big inconsistency IMHO. Resolving this issue pushes this library to the next level 🚀