loophp / collection

A (memory) friendly, easy, lazy and modular collection class.
https://loophp-collection.rtfd.io/
MIT License
722 stars 35 forks source link

[Feature request] Implement stable sorting #331

Closed jazithedev closed 10 months ago

jazithedev commented 10 months ago

Hello 👋. Let's assume to have this simple value object in the project:

final readonly class MyValueObject
{
    public function __construct(
        public int $id,
        public int $weight,
    ) {
    }
}

I need a feature of sorting such value objects by weight. The major thing about it is that the sorting itself must be stable. Meaning, two objects of the same weight should not exchange their position with each other within a single list. This can be tested with such a test:

public function testStableSorting(): void
{
    $input = Collection::fromIterable([
        new MyValueObject(id: 1, weight: 1),
        new MyValueObject(id: 2, weight: 1),
        new MyValueObject(id: 3, weight: 1),
    ])
        ->sort(callback: static fn (MyValueObject $a, MyValueObject $b): int => $a->weight <=> $b->weight)
        ->map(static fn (MyValueObject $item): int => $item->id)
        ->all();

    self::assertEquals([1, 2, 3], $input);
}

Unfortunately, the test fails as the result stored in the $input variable is [1, 3, 2]. Meaning: sorting is not stable.

Such thing was already addressed in the past for PHP. On 2022 Nikita Popov made PHP sorting stable. Hence, using usort(...) function in the above example will make the test to pass.

Would it be possible to implement stable sorting in loophp/collection?

drupol commented 10 months ago

Hi!

Thanks for bringing this here, I wasn't aware of that behavior. I will evaluate it carefully and see if it can be implemented.

drupol commented 10 months ago

@jazithedev Let me know if the proposed PR fixes the issue ?

jazithedev commented 10 months ago

@drupol Indeed it works as intended with that fix 😊.

In your test from PR you can use abstract classes to get rid of such classes like MyValueObject which for sure will be used only in that one, single test case:

public function testIssue331(): void
{
    $valueObjectFactory = static fn ($id, $weight) => new class($id, $weight) {
        public function __construct(
            public int $id,
            public int $weight,
        ) {
        }
    };

    $input = Collection::fromIterable([
        $valueObjectFactory(id: 1, weight: 1),
        $valueObjectFactory(id: 2, weight: 1),
        $valueObjectFactory(id: 3, weight: 1),
    ])
        ->sort(callback: static fn ($a, $b): int => $a->weight <=> $b->weight)
        ->map(static fn ($item): int => $item->id);

    self::assertEquals([1, 2, 3], $input->all());
}

It is just a bit less readable probably.

drupol commented 10 months ago

Thanks for this nice bug report! The fix will be available in 7.4.

drupol commented 10 months ago

@jazithedev I improved further the sort operation, do you mind testing the development version?

Related PR: https://github.com/loophp/collection/pull/334

jazithedev commented 10 months ago

@drupol

Sure! I'll check it tomorrow 👍️.

jazithedev commented 10 months ago

@drupol

Unfortunately, something is not right at "loophp/collection": "dev-master" version. The elements are sorted in a strange way. I didn't analyze it deeper what could be the cause. Here's the code with tests which checks two sorting algorithms:

The PHP version passes, where the other unfortunately doesn't.

private static function createValueObject(int $id, int $weight): object
{
    return new class($id, $weight) {
        public function __construct(
            public int $id,
            public int $weight,
        ) {
        }
    };
}

public function testPhpSorting(): void
{
    $input = [
        self::createValueObject(id: 1, weight: 2),
        self::createValueObject(id: 160, weight: 1),
        self::createValueObject(id: 1600, weight: 3),
        self::createValueObject(id: 2, weight: 2),
        self::createValueObject(id: 150, weight: 1),
        self::createValueObject(id: 1500, weight: 3),
        self::createValueObject(id: 3, weight: 2),
    ];

    usort($input, static fn ($a, $b): int => $a->weight <=> $b->weight);

    $collection = Collection::fromIterable($input)->map(static fn ($item): int => $item->id);

    self::assertEquals([160, 150, 1, 2, 3, 1600, 1500], $collection->all());
}

public function testCollectionSorting(): void
{
    $input = Collection::fromIterable([
        self::createValueObject(id: 1, weight: 2),
        self::createValueObject(id: 160, weight: 1),
        self::createValueObject(id: 1600, weight: 3),
        self::createValueObject(id: 2, weight: 2),
        self::createValueObject(id: 150, weight: 1),
        self::createValueObject(id: 1500, weight: 3),
        self::createValueObject(id: 3, weight: 2),
    ])
        ->sort(callback: static fn ($a, $b): int => $a->weight <=> $b->weight)
        ->map(static fn ($item): int => $item->id);

    self::assertEquals([160, 150, 1, 2, 3, 1600, 1500], $input->all());
}

Both tests are fine with "loophp/collection": "^7.4".

drupol commented 10 months ago

Thanks, looking at it immediately !

drupol commented 10 months ago

I see the issue. To preserve the old behaviour, I had to keep sorting in the same direction (ASC). It was done in https://github.com/loophp/iterators/pull/52

To fix the issue you just posted, you just need to update your callback as such:

    ->sort(callback: static fn ($a, $b): int => $b->weight <=> $a->weight)

And the test are passing.

drupol commented 10 months ago

After giving a second thought, I'm going to update the behaviour in loophp/iterators so it matches usort in loophp/collection.

jazithedev commented 10 months ago

Indeed the change you suggested works. Tho, it's a bit strange 😅. I mean, I think in most cases of "spaceship operator" it is $a <=> $b and not $b <=> $a. However, I could be wrong.

One another thing I noticed is keys preservation. The usort(...) function resets the keys, where collection leaves them as they were. It is a feature (imo), so I used normalize() method. However, this normalization process (of keys resetting) was taking more time on a collection of, for example, 2000 elements.

drupol commented 10 months ago

Everything has been fixed in 7.5.0 which is now available :)

Thanks for this very interesting issue and the tests !