Open tyx opened 7 years ago
@tyx we would need to implement compare_objects
on the zend objects, ie. php_ds_set
etc. The internal htable would do the actual work though, eg.:
compare_objects
handlerds_set_equals(ds_set_t *a, ds_set_t* b)
ds_htable_equals(ds_htable_t *a, ds_htable_t *b)
I'm guessing we'd iterate through both the table's keys and values and check if they're equal? (Or in the case of a Set, only the keys as all the values would be undefined).
toArray
would be a performance hit, might also run into non-scalar keys.
Sorry about this annoying issue - keen to get it implemented and tested though.
There's also the obvious win if the object being compared to is the same instance. ⚡️
Looks nice ! Thanks you for your fast answer.
toArray would be a performance hit, might also run into non-scalar keys.
Indeed !
Sorry about this annoying issue - keen to get it implemented and tested though.
Compared to the benefits we get to use your extension, it's not a big deal in fact ;)
There's also the obvious win if the object being compared to is the same instance.
Yes ! This one works actually
@nikic would this actually be implicit if we implement the get_properties
handler?
@rtheunissen Yes-ish. I'd assume you wouldn't want to have the same comparison semantics as a property comparison would give you. In particular, those would be weak and I doubt you'd want to have Set(["foo"]) == Set([true])
, right?
The only issue here is that we can't implement this in the polyfill. :no_good_man:
Only option would be to implement Hashable
and use equals
for equality checks. Would never need >
and <
anyway.
@rtheunissen ok so the main issue is equality so it's not a big issue about <
and >
. And yes why not with Hashable
, we're ok that Set, Vector, ... don't implement it ATM? This is what you've planned to solve the issue.
@shouze happy to do it with equals. Added to 2.0, see #65
This is actually more difficult than I expected. equals
as part of the Hashable
interface sort of scopes that equality check to a context in which the object is used as a key in a hash table. I'm not sure if that also implies general equality, ie. a case where you'd check $a === $b
.
For example, if we implement equals
on Collection
, and we're checking each item of the collection for strict equality, do we call equals
on values that implement Hashable
?
Or should we create an Equality
interface and have Hashable
just extend that?
First of all, to me identity comparison (===
) is something else and not part of the scope of this issue.
For example, if we implement equals on Collection, and we're checking each item of the collection for strict equality, do we call equals on values that implement Hashable?
Yes why not.
Or should we create an Equality interface and have
Hashable
just extend that?
And yes why not, I'm not sure that it's needed but could be a nice to have.
In fact I wonder about the way you want to proceed:
Collection
when we want to check that a collection is equal to another?mutate
the collection?Do you want to lookup each entries of eg a Collection when we want to check that a collection is equal to another?
Yes I think so. Something similar to what lodash does with _.equals. We can fail fast with things like type and size comparisons. An interesting case would be a Map, which may not have the same ordering, but could be considered "equal". {a: 1, b: 2} == {b: 2, a: 1}
for example. We would have to define this for each collection.
Or do you plan to maintain an always up2date internal hash, updated each time we mutate the collection?
No I don't think this is practical, better to just iterate through to compare.
The issue remains that we can't implement a ==
check in the polyfill.
Whenever you would use $set == $another
, you would have to use $set->equals($another)
.
@rtheunissen yes ok for $set->equals($another)
if it can help to maintain the polyfill at the same level of features alongside.
I'd like to keep them consistent, otherwise it invalidates the polyfill to some extent. Maybe if and when the extension becomes a default, we can drop the polyfill.
Yes, can be a scheduled deprecation but you're right it's probably not the time to do that as it would break adoption.
This change is actually quite complex. Whenever values are compared (filter
, sort
, find
, etc), we now have to check if the value implements Hashable
, and use its Hashable::equals
if it does. Of course it's easy to write a helper like valuesAreEqual
that checks this for us, but the effects are clearly outside of the Hashable
domain. I'm therefore convinced that we should move the equals
method of Hashable
to its own interface, maybe Equality
, with only a single method equals
.
It's a shame that PHP still doesn't have a Comparable
interface or operator overloading. I'm tempted to wait for the Comparable RFC to pass and be released, which would make the Equality
interface redundant.
You can use something like $a->diff($b)->isEmpty()
but that wouldn't currently call equals
on any objects in the set that implement Hashable
. If you know you only have scalar values in your set, you can wrap it and implement your own equals
for now.
We're are considering to replace our custom collection structures by DS, but we use collection comparisons heavily.
Java defines specific strategies for computing hash codes for each type of collection. For maps, for instance, the hash code is order insensitive (hash += hash(key) ^ hash(value))
, so that the equality relation is valid across all map implementations.
Our collection library has an Equality
interface, from which Hashable
extends from. It makes sense for us since the concept of hash only makes sense tied to an equality relation. Defining a concept of equality relation between collections also requires defining how the hash codes should be computed consistently as per collection type (it should be part of the interface contract). In Java, for instance, two maps are equal if both have the same EntrySet, while two sets are equal if both have the same elements. The rule for determining if two objects are equals is as follows:
$a
and $b
, are equal if $a->equals($b)
and $b->equals($a)
All methods should follow these rules, including diff
, intersect
, remove
, etc.
Despite the issues related to the implementation of operation comparison overloading, it would be really helpful to have an equals
method that allows us testing collections for equality. It is not a BC break and does not require complex changes as implementing operation overloading does.
About the Comparable RFC, it's an old discussion and I don't believe it will be added to the core anytime soon.
It's is not a BC break and does not require complex changes as implementing operation overloading does.
That's true, but changing the behaviour of diff
, intersect
, remove
, etc could be considered significant enough to add as part of 2.0 (end 2017).
I agree that the collections should implement Hashable, and I think following the direction you've laid out makes sense. No need for the overloaded op just yet, that can come later.
@rtheunissen thanks for considering it! Indeed, changing the behavior of these methods does represent a BC break, but introducing the equals method does not. Any chance of having it before the 2.0 release? Looking the issues, it is probably the most requested feature.
So we're implementing an equals method, but not Hashable yet?
Why not Hashable? We can just leave diff
, intersect
, remove
untouched until 2.0. For now, it should be possible to implement equals
and hash
without introducing a BC break, right?
But then Map and Set will try to use them as Hashable's, and not spl_object_hash
, which could break things. We can add an equals
method that defines equality, but implementing Hashable is a 2.0 addition.
Right, I missed it. Yes, equals
will help a lot for now. Just to be sure, how do you plan to implement this algorithm? Performance is always a problem in collection comparisons. What I've in mind is something like the following algorithm, suitable for the polyfill, but obviously written in C:
public function equals($other): bool
{
if (!($other instanceof Map)){
return false;
}
if ($this->count() !== $other->count()) {
return false;
}
foreach ($this->map as $key => $value) {
if (!$other->hasKey($key)) {
return false;
}
/**
* isEqual()
* - Two identical objects are always equal
* - Two Hashable objects $a and $b, are equal if $a->equals($b) and $b->equals($a)
* - Everything else is unequal
*/
if (!$this->isEqual($value, $other->get($key))) {
return false;
}
}
return true;
}
So {a: 1, b:2} === {b:2, a: 1}
? I'm considering making the comparison order-sensitive, because it's an ordered collection, unlike a Java Map (excl ordered map).
If ordering is considered, we can do a simple O(n) comparison from 0 to n - 1. If ordering is not considered, then your example is spot on, O(n log n).
From Java docs:
Compares the specified object with this map for equality. Returns true if the given object is also a map and the two maps represent the same mappings. More formally, two maps m1 and m2 represent the same mappings if m1.entrySet().equals(m2.entrySet()). This ensures that the equals method works properly across different implementations of the Map interface.
So, in Java ordered and unordered collections and considered equal.
However, I like your idea. We could further improve the collection by introducing a new methodcontainsAll(Map $map)
, so that $left->count() === $right->count() && $left->containsAll($right)
is the same of checking for order insensitive equality. In that way we provide an alternative for performing both checks.
What if we added a bool arg to xor
, diff
and intersect
to also match on the value. That way, to determine if a Map is equal you can check if count($map1->xor($map2, true)) == 0
? I think that works?
Can always have a bool arg in equals
, which won't break the Hashable contract. bool $ordered = false
? I really don't like boolean arguments though because their function is not always clear from the caller's end.
I think we should keep it simple here. Check all pairs and make the same ordering a requirement for equality. If you have a potentially different order, sort first then check check equality.
I think it can quickly become unmaintainable. So, the behavior of these flags set to false is the same as not implementing the Hashable
interface. In other words, if an object implements the Hashable
interface it is explicitly defining the standard way for comparing two instances of that type. I think that we should enforce adherence to the protocol. Furthermore, if one wants this behavior and there is no big perfomance gain between the native array and the map, there is no reason for using the map at all.
Java's argument for not considering ordering a part of the criteria for equality is based on the fact that their collections aren't ordered in the first place (bar a few). What's interesting to note is that Java doesn't consider ordering even when using collections like LinkedMap
. See http://grepcode.com/file/repository.grepcode.com/java/root/jdk/openjdk/8u40-b25/java/util/LinkedHashMap.java
Yes, as I previously noticed:
So, in Java ordered and unordered collections and considered equal.
Another point to consider is how to compare arrays. This is my suggestion:
class Foo implements Hashable {
private $name;
public function equals($b) {
return $b instanceof Hashable && $this->name === $b->name;
}
}
$left = new Map();
$left->set(0, [new Foo('a')]);
$right = new Map();
$right->set(0, [new Foo('a')]);
assertTrue($left->equals($right));
What does the PHP array do? https://3v4l.org/bbTb0
Order is important when considering equality of an ordered collection. Java is wrong here IMO.
Another point to consider is how to compare arrays
Oh no.. we may have a problem here. If it's an array we'll need to traverse it? Recursively? See that's where this gets very messy where we don't have a native compareTo.
We can use the internal comparison handlers but we can't replicate that in the polyfill, which takes us back to square one in some respects. The array comparison isn't aware of equals
.
But it brings inconsistency. Is there any way of including the Hashable::equals
as part of the ===
comparison operator? If so, it could be achieved transparently.
I know that you can at least make it work with ==
, but I'm not sure about ===
. We'd need to diverge from the polyfill to support that though. ===
should be reserved for "same instance / reference" checks, so really all we need is the ==
.
==
makes much more sense, indeed. The good part about that is that we don't need to care about comparing array, it would be done indirectly.
Exactly, no need to implement equals
at all. The issue lies in the polyfill, which we're considering to abandon for 2.0
If we implement == in 1.x, we have an inconsistency with the polyfill.
I don't know how hard is it, but if it was possible to implement for DateTime
it should not be hard to implement against an interface (or at least I hope so :)
Constraining the roadmap according to a polyfill is a big mistake IMHO. The polyfill makes no sense for production use. For any other purpose, I believe that an IDE plugin or just stubs are the way to go.
We have another problem. If we use the equality operator, then type juggling will break our plans. Furthermore, the operator ==
is order insensitive in regards to arrays. So, is there any way to reuse the strict comparison mechanism to compare arrays considering Hashable
? It would make the operator itself untouched, but it could be used by the equals
method.
Furthermore, the operator == is order insensitive in regards to arrays
~Not true! https://3v4l.org/bbTb0~ I was wrong here, that test was misleading because [0, 1]
and [1, 0]
might be the same values but their keys are 0
and 1
, and therefore not equal pairs. If you were to try [0 => 0, 1 => 1]
and [1 => 1, 0 => 0]
you'll see that order doesn't matter.
type juggling will break our plans
How so?
From the docs:
Example | Name | Result |
---|---|---|
$a + $b | Union | Union of $a and $b. |
$a == $b | Equality | TRUE if $a and $b have the same key/value pairs. |
$a === $b | Identity | TRUE if $a and $b have the same key/value pairs in the same order and of the same types. |
$a != $b | Inequality | TRUE if $a is not equal to $b. |
$a <> $b | Inequality | TRUE if $a is not equal to $b. |
$a !== $b | Non-identity | TRUE if $a is not identical to $b. |
http://php.net/manual/en/language.operators.array.php
type juggling will break our plans How so?
[1, true, false] == ["1", 1, 0]
Ahh... well don't use arrays then. ;)
It's not a choice. There is no way to constrain it in an application.
I stand corrected, "the operator == is order insensitive in regards to arrays" https://3v4l.org/jdIBv
Yeah, maybe the behavior changed in PHP 7 but the doc is outdated. Anyway, the major issue here is the type juggling.
So, is there any way to reuse the strict comparison mechanism to compare arrays considering Hashable? It would make the operator itself untouched, but it could be used by the
equals
method.
?
I think we should try to stay as close to arrays as possible. 😞
==
comparison on everything?I personally don't use the operator ==
. I try to avoid it as much as possible.
I think we should try to stay as close to arrays as possible. 😞
Stay close to array implies that order matters. About the equality relation, just close the eyes for arrays is a big inconsistency IMHO. Resolving this issue pushes this library to the next level 🚀
I think creating a different issue from #65 is a better idea.
Current problem
it makes me very sad as I definitively can't rely on Ds in my tests. And I use
Ds
everywhere as it does a really nice job (thanks for that anyway)Solution
Is there any reason why this behavior ? And is it a bug or a known behavior ? Looks too big that anyone complain about it ^^
Whatever I'm definitively open to contribute event if my C skill are very old because it cannot stay like that.
I already had a look on the (clean) code. I guess we need to implements the compare_objects handlers on the internal htable ? Looks pretty obvious, I know. May be we can even use the
toArray
and rely onarray::compare_objects
? Is there any stuff that prevent us to do that ?Anyway, let me know what I can do to help on this very annoying issue.
Thanks