tonyarnold / Differ

Swift library to generate differences and patches between collections.
https://tonyarnold.github.io/Differ/
MIT License
662 stars 74 forks source link

Crash error: can't allocate region #12

Open onmyway133 opened 6 years ago

onmyway133 commented 6 years ago

Hi @tonyarnold , I'm playing with different diffing algorithms, and want to give them some benchmarks https://github.com/onmyway133/DeepDiff#among-different-frameworks. When comparing Differ on some data sets, I get a crash. I run on device iPhone 6, iOS 11, Xcode 9.2. Maybe the datset is wrong, but it runs fine with many other diffing framworks

Benchmark(1924,0x1b545bb80) malloc: *** mach_vm_map(size=1073741824) failed (error code=3)
*** error: can't allocate region
*** set a breakpoint in malloc_error_break to debug

Here is how such a data set gets generated

private func generate() -> (old: Array<String>, new: Array<String>) {
    let old = Array(repeating: UUID().uuidString, count: 10000)
    var new = old

    new.removeSubrange(100..<200)
    new.insert(
      contentsOf: Array(repeating: UUID().uuidString, count: 500),
      at: new.endIndex.advanced(by: -20)
    )

    return (old: old, new: new)
  }

differ

wokalski commented 6 years ago

Re benchmark: it looks flawed to me because you generate 1 array with 10000 identical elements and another one with 2 kinds of elements with 9900 and 500 copies respectively. It should probably be something more distributed with more kinds. You can use random alphabet letters for instance and run it a few times so that they are distributed.

There are over 13,000,000 elements in the traces array. Removing "bad" traces early is a badly needed optimization here.

Btw it's interesting that the simple algorithm (that you wrote) performs the best. I don't have time to profile it but some profiling would be interesting. I guess you use far fewer allocations and maybe avoid copies? Asymptotically looking this library should be faster but the devil lies in details.

ListDiff doesn't generate a minimal diff so it's kind of a bonus if we look at it strictly (not comparable in the benchmark). It should be the fastest (linear IIRC).

That said I hope that people don't use hogs but I'm disappointed that this library couldn't handle mere 20000 elements.


EDIT: I misread. You're using Heckel, great!

onmyway133 commented 6 years ago

@wokalski Thanks for the reply. About DeepDiff, I just followed this note https://gist.github.com/ndarville/3166060. It is based on 2 assumptions about line occurrences, so I have a table of entries to keep track of if a line appears once in each collection.

rayfix commented 6 years ago

@onmyway133 I know this report is a little long in the tooth, but I tried to repo it and could not with the following code:

    func testBadPatch3() {
        let old = Array(repeating: UUID().uuidString, count: 10000)
        var new = old
        new.removeSubrange(100..<200)

        new.insert(
            contentsOf: Array(repeating: UUID().uuidString, count: 500),
            at: new.endIndex.advanced(by: -20)
        )
        let p = patch(from: old, to: new)

        let result = old.apply(p)
        XCTAssertEqual(result, new)
    }

Any ideas? Do you think if I run it enough times I will hit it?

onmyway133 commented 6 years ago

@rayfix Hi, I'm have no ideas since it's long time ago 😅 , but those are the code I run when I did the benchmarking. And did you run tests on a device?