immerjs / immer

Create the next immutable state by mutating the current one
https://immerjs.github.io/immer/
MIT License
27.34k stars 844 forks source link

Immer patches for Array are correct, yet inefficient. Suggesting a new algorithm for optimized patches #1052

Open yoavaa opened 1 year ago

yoavaa commented 1 year ago

I am thinking as offering it as a pull request - let me know what you think

🚀 Feature Proposal

Today when computing a patch on arrays, immer compares objects by index, which results in all array change in case of unshift or splice usage. An example is the following test taken from immer testing -

https://github.com/immerjs/immer/blob/main/__tests__/patch.js#L543-L559

describe("arrays - multiple prepend", () => {
    runPatchTest(
        {x: [1, 2, 3]},
        d => {
            d.x.unshift(4)
            d.x.unshift(5)
            // 4,5,1,2,3
        },
        [
            {op: "replace", path: ["x", 0], value: 5},
            {op: "replace", path: ["x", 1], value: 4},
            {op: "replace", path: ["x", 2], value: 1},
            {op: "add", path: ["x", 3], value: 2},
            {op: "add", path: ["x", 4], value: 3}
        ]
    )
})

the expected result should be instead

        [
            {op: "add", path: ["x", 0], value: 5},
            {op: "add", path: ["x", 1], value: 4}
        ]

Motivation

I am working on a new framework that also serializes patches, and the above is a performance issue.

Consider the above case - for an array of object, lets say an array of 100 objects with 10 attributes each, the current algorithm will generate 1000 patches, while my suggestion is to produce only 1 per added / removed object

Can this be solved in user-land code?

no

Example

See the example above.

Usage: immer -> serialize -> deserialize -> patched object

suggested algorithm

(pseudo code)

the problem with json diff is that given two arrays, without any additional knowledge, to figure out if an item was pushed to the front, we have to do a === comparison between the array items, which is O(n^2) complexity.

for (let a = 0; a < A.length; a++) {
    for (let b = 0; b < A.length; b++) {
        if (A[a] === B[b])
            // compute the diff    
    }    
}

However, we can create an algorithm focused on small changes that has complexity O(n) with a cutoff. The algorithm makes the assumption that small changes can be found fast and require small number of mutations to describe. The cutoff is set at let limit = log(min(A.length, B.length)) and the algorithm has complexity O(limit^2) ~ O(n)

The algorithm -

let limit = log(min(A.length, B.length));
let a=0, b=0;
mainLoop: while (a < A.length && b < B.length) {
    if (A(a) === B(b)) {
        a += 1;
        b += 1
    }
    continue;
    for (let seekSize = 1; seekSize < limit; seekSize++) {
        for (let index = 1; index <= seekSize; index++) {
            if (A[a+index] === B[b+seekSize-index])
                // we have a match, the elements from A between (a..a+index) and replace them with the items from B between (b..b+seekSize-index)
            continue mainLoop;    
        }
    }
    // revert to direct comparison of attributes json diff model (the standard model used by all other algorithms)
}
yoavaa commented 1 year ago

Update:

Reading the Immer source again, I can see that for objects there is the state.assigned_ member which marks which object properties have been updated or deleted.

I suggest to extend the idea for arrays to also include added and removed array items, those making the above algorithm redundant and making for a simpler solution.

What do you think?