Namespace Set: intersection methods added

Smoren commented 1 year ago

Hi @markrogoyski,

This is a draft of PR.

Summary:

Methods Set::intersection(), Set::partialIntersection() added and covered by tests.
Methods Set::intersectionStrict(), Set::partialIntersectionStrict() added but have not covered by tests yet.
New util classes added: JustifyMultipleIterator, NoValueMonad, UsageMap.

If everithing is OK I will implement another methods of intersection functionality and cover them by tests.

Draw your attention: this PR is depends of previous.

coveralls commented 1 year ago

Pull Request Test Coverage Report for Build 3983302703

Warning: This coverage report may be inaccurate.

This pull request's base commit is no longer the HEAD commit of its target branch. This means it includes changes from outside the original pull request, including, potentially, unrelated coverage changes.

For more information on this, see Tracking coverage changes with pull request builds.
To avoid this issue with future PRs, see these Recommended CI Configurations.
For a quick fix, rebase this PR at GitHub. Your next report should be accurate.

Details

61 of 61 (100.0%) changed or added relevant lines in 5 files are covered.
No unchanged relevant lines lost coverage.
Overall coverage increased (+0.04%) to 99.768%

Totals
Change from base Build 3965219794:	0.04%
Covered Lines:	430
Relevant Lines:	431

💛 - Coveralls

Smoren commented 1 year ago

UPD: this branch is rebased from develop.

Smoren commented 1 year ago

@markrogoyski Hi Mark!

A question about simmetric difference: how do you think, do we need full symmetricDifference() method only or both of symmetricDifference() and partialSymmetricDifference()?

markrogoyski commented 1 year ago

Hi @Smoren,

Can you briefly explain what the difference is between the two, symmetricDifference and partialSymmetricDifference?

Thanks.

Smoren commented 1 year ago

Hi @markrogoyski,

symmetricDifference(...$collections) returns elements from each collection which are not found in all the other collections.
partialSymmetricDifference($n, ...$collections) returns elements from each collection which are not found at least in $n other collections ($n < count($n)).

UPD: branch is rebased from develop.

Smoren commented 1 year ago

@markrogoyski,

Intersection methods added to Stream namespace.
Tests added for all the intersection methods in Set and Stream namespaces.

The only thing missing at the moment is the documentation in the README file.

Smoren commented 1 year ago

Hi @markrogoyski,

I've added the descriptions of intersection methods to README file.

Now I think this branch is ready to merge.

markrogoyski commented 1 year ago

Hi @Smoren,

Thanks for working on the set operations. I have two questions.

1) Can you send me a link to a description and some examples of partial intersection? Briefly searching Google I didn't seem to find any authoritative articles on the topic. I'm familiar with the common set operations like union, intersection, difference and symmetric difference, but had not encountered partial intersection before, so I want to read up on it a bit first.

2) Regarding strict vs non-strict, I think by default "strict" operations should be the default. Modern PHP is all about increasing type safety, and this is the norm for other strongly statically typed languages. So, I have a proposals: make strict the default (e.g. intersection). If you want to support non-strict, make that require changing an optional parameter or using an alternate method (e.g. intersectionNonStrict).

Another option would be to simply not support non-strict operations, but I'm not sure if that would limit usage of the library or not. It would simplify the code and interfaces though.

Thoughts? Mark

Smoren commented 1 year ago

Hi @markrogoyski,

Thank you for the feedback!

Intersection methods renamed:
- Set::intersection() => Set::intersectionNonStrict();
- Set::intersectionStrict() => Set::intersection();
- Set::partialIntersection() => Set::partialIntersectionNonStrict();
- Set::partialIntersectionStrict() => Set::partialIntersection();
- Stream::intersectionWith() => Stream::intersectionNonStrictWith();
- Stream::intersectionStrictWith() => Stream::intersectionWith();
- Stream::partialIntersectionWith() => Stream::partialIntersectionNonStrictWith();
- Stream::partialIntersectionStrictWith() => Stream::partialIntersectionWith().
Tests of these methods renamed.
README updated.

About "partial intersection":

I also searched on Google and did not find articles about this. It is strange. Perhaps this operation has a specific name, which I do not know. However, even if this operation is not classical, it is very useful.

I'll give my own definition:

We have N sets. An M-partial intersection of these sets is a set whose elements are contained in at least M initial sets.

$a = [1, 2, 3, 4, 5, 6, 7];
$b = [-1, 0, 1, 3, 5];
$c = [0, 2, 4, 10];

// N = 3, M = 2
$r = partialIntersection(2, $a, $b, $c); // 2-partial intersection of 3 sets
// [0, 1, 2, 3, 4, 5]

In addition, the classical (complete) intersection of sets can be considered a special case of partial intersection when M = N.

This operation was required to me many times when solving different problems.

For example, to obtain the target audience for advertising, when it is enough for me that a representative of the target audience occurs in at least a few of the samples we have, and not necessarily in all.

So I hope that you will agree to include this operation in the package.

Smoren commented 1 year ago

More examples of partial intersection.

For ease of perception, both the initial data and the results are presented in sorted form.

Also I've added test to demonstrate this behavior.

Given: sets A, B, C, D (N = 4).

For example:

$a = [1, 2, 3, 4, 5];
$b = [1, 2, 10, 11];
$c = [1, 2, 3, 12];
$d = [1, 4, 13, 14];

M = 1

It is equivalent to classical union operation (for any N).

$r = partialIntersection(1, $a, $b, $c, $d);
// [1, 2, 3, 4, 5, 10, 11, 12, 13, 14]

M = 2

It is equivalent to union($a, $b, $c, $d) - symmetricDifference($a, $b, $c, $d) (for any N).

$r = partialIntersection(2, $a, $b, $c, $d);
// [1, 2, 3, 4]

M = 3

$r = partialIntersection(3, $a, $b, $c, $d);
// [1, 2]

M = 4 (M = N)

It is equivalent to classical (complete) intersection operation (for any N).

$r = partialIntersection(4, $a, $b, $c, $d);
// [1]

M = 5 (M > N)

It is equivalent to empty set (for any M, N: M > N)

$r = partialIntersection(5, $a, $b, $c, $d);
// []

markrogoyski commented 1 year ago

Thanks for the definition and concrete examples. It makes sense to me. Let's go with your suggestion of partialIntersection.

Smoren commented 1 year ago

@markrogoyski, thank you Mark!

So I think this PR is ready to merge if you don't have another questions.

markrogoyski commented 1 year ago

Suggestion: For the Stream intersection tests, could you add a unit test case or two where the intersection comes after a zipWith call? I think this would result in arrays of arrays and it would be helpful to have tests that document the behavior and show that it works.

Smoren commented 1 year ago

@markrogoyski,

All your suggestions are applied.

Tests of Stream namespace refactored:

array_pop() usage removed;
->zipWith()->intersectionWith() test cases added.

Smoren commented 1 year ago

@markrogoyski, BTW, what do you think about including the partial intersection functionality to your MathPHP library too?

markrogoyski commented 1 year ago

Hi @Smoren,

I was reading the PHP documentation to see what the opposite of "strict typing" is, and it seems PHP uses the term "coerce." For example, to coerce a value, coercive typing, type coercion, etc.

See:

Seeing that PHP seems to use "coerce," and I did not find any usage of the phrase "non-strict typing," what do you think about renaming the methods to something like intersectionCoercive?

Sorry for yet another request. I think names are important in communicating the intent of a function so the user can understand what it does without needing to rely on documentation.

Let me know what you think. Thanks.

And regarding MathPHP, sure, I can add it to my to-do list and add it the SetTheory section. Or you can submit a PR if you want.

Mark

Smoren commented 1 year ago

Hi @markrogoyski,

Totally agree with you. It looks much more elegant.

All the non-strict intersection methods are renamed using Coercive word instead of NonStrict.

Smoren commented 1 year ago

@markrogoyski,

About MathPHP: I would like to participate in the development of this package too. So I want to make a PR.

Do you have any suggestions on how best to frame it there?

Smoren commented 1 year ago

@markrogoyski

BTW, now I think we have to to write about multisets logic in the documentation.

Is it a good Idea?

markrogoyski commented 1 year ago

It definitely should be mentioned. Otherwise, it could be kind of unexpected. It makes sense once you know about it, because the input is not limited to pure sets, so it has a behavior that works on those inputs. I would put something simple like:

Note: If input iterables produce duplicate items, then multiset intersection rules apply.

Smoren commented 1 year ago

Thank you for note text!

I've added it to the Symmetric difference methods in set_difference branch.

But i did not add notes to the documentation of intersection methods to avoid possible conflicts.

markrogoyski commented 1 year ago

Hi @Smoren,

Can you explain what the expected behavior of partialIntersection is when $minIntersectionCount is 1 and the iterables are multisets? Also, can you please provide some unit test examples for those to test and document the behavior.

Thanks, Mark

Smoren commented 1 year ago

Hi @markrogoyski,

I've made another pull request with explanation of this behavior and unit test cases.

markrogoyski / itertools-php