google / go-cmp

Package for comparing Go values in tests
BSD 3-Clause "New" or "Revised" License
4.17k stars 209 forks source link

Diff is extremely slow #335

Open seehuhn opened 1 year ago

seehuhn commented 1 year ago

I tried to use cmp.Diff in a unit test, but for my data, the function is so slow that the unit tests time out. The data in question are two decoded versions of the "Go Regular" font (as described in the Go blog ). The comparison takes more than 16 seconds on a fast laptop.

Here is code to reproduce the problem:

package main

import (
    "fmt"
    "log"

    "github.com/google/go-cmp/cmp"

    "seehuhn.de/go/sfnt/glyph"

    "seehuhn.de/go/pdf"
    "seehuhn.de/go/pdf/font"
    "seehuhn.de/go/pdf/font/charcode"
    "seehuhn.de/go/pdf/font/gofont"
    "seehuhn.de/go/pdf/font/opentype"
)

func main() {
    otf, err := gofont.OpenType(gofont.GoRegular)
    if err != nil {
        log.Fatal(err)
    }

    encoding := make([]glyph.ID, 256)
    encoding[65] = otf.CMap.Lookup('A')
    encoding[66] = otf.CMap.Lookup('C')

    toUnicode := map[charcode.CharCode][]rune{
        65: {'A'},
        66: {'C'},
    }

    info1 := &opentype.EmbedInfoSimpleCFF{
        Font:      otf,
        SubsetTag: "UVWXYZ",
        Encoding:  encoding,
        ToUnicode: toUnicode,
    }

    rw := pdf.NewData(pdf.V1_7)
    ref := rw.Alloc()
    err = info1.Embed(rw, ref)
    if err != nil {
        log.Fatal(err)
    }

    dicts, err := font.ExtractDicts(rw, ref)
    if err != nil {
        log.Fatal(err)
    }
    info2, err := opentype.Extract(rw, dicts)
    if err != nil {
        log.Fatal(err)
    }

    d := cmp.Diff(info1, info2)
    fmt.Println(d)
}

The following go.mod file shows the exact versions I used:

module seehuhn.de/go/issues/001

go 1.21.0

require (
    github.com/google/go-cmp v0.5.9
    seehuhn.de/go/pdf v0.3.5-0.20230815144317-1dc53998ea9b
    seehuhn.de/go/sfnt v0.3.5-0.20230814123027-5af1767be82b
)

require (
    github.com/xdg-go/stringprep v1.0.4 // indirect
    golang.org/x/exp v0.0.0-20230713183714-613f0c0eb8a1 // indirect
    golang.org/x/image v0.7.0 // indirect
    golang.org/x/text v0.9.0 // indirect
    seehuhn.de/go/dag v0.0.0-20230612165854-b02059e84ec5 // indirect
    seehuhn.de/go/dijkstra v0.9.3 // indirect
    seehuhn.de/go/postscript v0.3.5-0.20230811095027-86e1857cf919 // indirect
)

Since the font data is of moderate size (a few hundred glyphs, each containing a few tens of control points), I would expect the comparison to be much faster than 16 seconds.

dsnet commented 1 year ago

Thank you for the bug report and reproduction. This is a good bug report.

It's known that there are a few edge-cases that cause cmp to have pathological behavior (e.g., excessive branching and re-checking of results in certain nested structures). I was able to reproduce a execution time of 25s on my Ryzen 5900x. I won't have time to dig into what's going on in the near future, but can hopefully can fix this.

Carrotman42 commented 8 months ago

I've worked around slowness in cmp.Diff (with no cmpopts applied) by checking reflect.DeepEqual first; in the 5-ish times I've done this I've found it to always be correct and performant. Would it be reasonable to include that in the implementation of cmp.Diff? i.e. if reflect.DeepEqual returns true, have a fast-path return "", else do the full diffing that callers expect.

dsnet commented 8 months ago

The semantics of reflect.DeepEqual is not identical to cmp.Equal even without any options specified, so that approach won't work in the general case (it might work for your use case). What we can do though is to analyze the set of options passed in and figure out exactly what information needs to be tracked and avoid doing that if it's unnecessary. For example, tracking the entire cmp.Path is unnecessary if cmp.FilterPath is not used.