Closed Smoren closed 1 year ago
This pull request's base commit is no longer the HEAD commit of its target branch. This means it includes changes from outside the original pull request, including, potentially, unrelated coverage changes.
Totals | |
---|---|
Change from base Build 3965219794: | 0.04% |
Covered Lines: | 430 |
Relevant Lines: | 431 |
UPD: this branch is rebased from develop
.
@markrogoyski Hi Mark!
A question about simmetric difference: how do you think, do we need full symmetricDifference()
method only or both of symmetricDifference()
and partialSymmetricDifference()
?
Hi @Smoren,
Can you briefly explain what the difference is between the two, symmetricDifference
and partialSymmetricDifference
?
Thanks.
Hi @markrogoyski,
symmetricDifference(...$collections)
returns elements from each collection which are not found in all the other collections.partialSymmetricDifference($n, ...$collections)
returns elements from each collection which are not found at least in $n
other collections ($n
< count($n)
).UPD: branch is rebased from develop
.
@markrogoyski,
Stream
namespace.Set
and Stream
namespaces.The only thing missing at the moment is the documentation in the README file.
Hi @markrogoyski,
I've added the descriptions of intersection methods to README file.
Now I think this branch is ready to merge.
Hi @Smoren,
Thanks for working on the set operations. I have two questions.
1) Can you send me a link to a description and some examples of partial intersection? Briefly searching Google I didn't seem to find any authoritative articles on the topic. I'm familiar with the common set operations like union, intersection, difference and symmetric difference, but had not encountered partial intersection before, so I want to read up on it a bit first.
2) Regarding strict vs non-strict, I think by default "strict" operations should be the default. Modern PHP is all about increasing type safety, and this is the norm for other strongly statically typed languages. So, I have a proposals: make strict the default (e.g. intersection
). If you want to support non-strict, make that require changing an optional parameter or using an alternate method (e.g. intersectionNonStrict
).
Another option would be to simply not support non-strict operations, but I'm not sure if that would limit usage of the library or not. It would simplify the code and interfaces though.
Thoughts? Mark
Hi @markrogoyski,
Thank you for the feedback!
Set::intersection()
=> Set::intersectionNonStrict()
;Set::intersectionStrict()
=> Set::intersection()
;Set::partialIntersection()
=> Set::partialIntersectionNonStrict()
;Set::partialIntersectionStrict()
=> Set::partialIntersection()
;Stream::intersectionWith()
=> Stream::intersectionNonStrictWith()
;Stream::intersectionStrictWith()
=> Stream::intersectionWith()
;Stream::partialIntersectionWith()
=> Stream::partialIntersectionNonStrictWith()
;Stream::partialIntersectionStrictWith()
=> Stream::partialIntersectionWith()
.About "partial intersection":
I also searched on Google and did not find articles about this. It is strange. Perhaps this operation has a specific name, which I do not know. However, even if this operation is not classical, it is very useful.
I'll give my own definition:
We have N sets. An M-partial intersection of these sets is a set whose elements are contained in at least M initial sets.
$a = [1, 2, 3, 4, 5, 6, 7];
$b = [-1, 0, 1, 3, 5];
$c = [0, 2, 4, 10];
// N = 3, M = 2
$r = partialIntersection(2, $a, $b, $c); // 2-partial intersection of 3 sets
// [0, 1, 2, 3, 4, 5]
In addition, the classical (complete) intersection of sets can be considered a special case of partial intersection when M = N.
This operation was required to me many times when solving different problems.
For example, to obtain the target audience for advertising, when it is enough for me that a representative of the target audience occurs in at least a few of the samples we have, and not necessarily in all.
So I hope that you will agree to include this operation in the package.
More examples of partial intersection.
For ease of perception, both the initial data and the results are presented in sorted form.
Also I've added test to demonstrate this behavior.
Given: sets A, B, C, D (N = 4).
For example:
$a = [1, 2, 3, 4, 5];
$b = [1, 2, 10, 11];
$c = [1, 2, 3, 12];
$d = [1, 4, 13, 14];
It is equivalent to classical union operation (for any N).
$r = partialIntersection(1, $a, $b, $c, $d);
// [1, 2, 3, 4, 5, 10, 11, 12, 13, 14]
It is equivalent to union($a, $b, $c, $d) - symmetricDifference($a, $b, $c, $d)
(for any N).
$r = partialIntersection(2, $a, $b, $c, $d);
// [1, 2, 3, 4]
$r = partialIntersection(3, $a, $b, $c, $d);
// [1, 2]
It is equivalent to classical (complete) intersection operation (for any N).
$r = partialIntersection(4, $a, $b, $c, $d);
// [1]
It is equivalent to empty set (for any M, N: M > N)
$r = partialIntersection(5, $a, $b, $c, $d);
// []
Thanks for the definition and concrete examples. It makes sense to me. Let's go with your suggestion of partialIntersection
.
@markrogoyski, thank you Mark!
So I think this PR is ready to merge if you don't have another questions.
Suggestion: For the Stream intersection tests, could you add a unit test case or two where the intersection comes after a zipWith
call? I think this would result in arrays of arrays and it would be helpful to have tests that document the behavior and show that it works.
@markrogoyski,
All your suggestions are applied.
Tests of Stream
namespace refactored:
array_pop()
usage removed;->zipWith()->intersectionWith()
test cases added.@markrogoyski, BTW, what do you think about including the partial intersection functionality to your MathPHP library too?
Hi @Smoren,
I was reading the PHP documentation to see what the opposite of "strict typing" is, and it seems PHP uses the term "coerce." For example, to coerce a value, coercive typing, type coercion, etc.
See:
Seeing that PHP seems to use "coerce," and I did not find any usage of the phrase "non-strict typing," what do you think about renaming the methods to something like intersectionCoercive
?
Sorry for yet another request. I think names are important in communicating the intent of a function so the user can understand what it does without needing to rely on documentation.
Let me know what you think. Thanks.
And regarding MathPHP, sure, I can add it to my to-do list and add it the SetTheory section. Or you can submit a PR if you want.
Mark
Hi @markrogoyski,
Totally agree with you. It looks much more elegant.
All the non-strict intersection methods are renamed using Coercive
word instead of NonStrict
.
@markrogoyski,
About MathPHP: I would like to participate in the development of this package too. So I want to make a PR.
Do you have any suggestions on how best to frame it there?
@markrogoyski
BTW, now I think we have to to write about multisets logic in the documentation.
Is it a good Idea?
It definitely should be mentioned. Otherwise, it could be kind of unexpected. It makes sense once you know about it, because the input is not limited to pure sets, so it has a behavior that works on those inputs. I would put something simple like:
Note: If input iterables produce duplicate items, then multiset intersection rules apply.
Thank you for note text!
I've added it to the Symmetric difference methods in set_difference
branch.
But i did not add notes to the documentation of intersection methods to avoid possible conflicts.
Hi @Smoren,
Can you explain what the expected behavior of partialIntersection
is when $minIntersectionCount
is 1 and the iterables are multisets? Also, can you please provide some unit test examples for those to test and document the behavior.
Thanks, Mark
Hi @markrogoyski,
I've made another pull request with explanation of this behavior and unit test cases.
Hi @markrogoyski,
This is a draft of PR.
Summary:
Set::intersection()
,Set::partialIntersection()
added and covered by tests.Set::intersectionStrict()
,Set::partialIntersectionStrict()
added but have not covered by tests yet.JustifyMultipleIterator
,NoValueMonad
,UsageMap
.If everithing is OK I will implement another methods of intersection functionality and cover them by tests.
Draw your attention: this PR is depends of previous.