Loading performance pt. 2

madig commented 3 years ago

Using a glyphs2ufo'd Noto Sans from https://github.com/googlefonts/noto-source/tree/d19e3db5ab7f87bfab30b8ecf68601fd81521539.

Lazy loading loads single glyphs on demand, eager loading loads whole layers up front (using parallel glif file loading in norad with rayon).

Rusty:

$ hyperfine --warmup 3 "python benches/bench_eager.py" "python benches/bench_lazy.py" 
Benchmark #1: python benches/bench_eager.py
  Time (mean ± σ):      4.457 s ±  0.022 s    [User: 7.619 s, System: 0.565 s]
  Range (min … max):    4.418 s …  4.485 s    10 runs

Benchmark #2: python benches/bench_lazy.py
  Time (mean ± σ):      5.666 s ±  0.027 s    [User: 5.294 s, System: 0.339 s]
  Range (min … max):    5.631 s …  5.704 s    10 runs

Summary
  'python benches/bench_eager.py' ran
    1.27 ± 0.01 times faster than 'python benches/bench_lazy.py'

Vanilla ufoLib2, using fontTools.ufoLib:

$ hyperfine --warmup 3 "python benches/bench_eager.py" "python benches/bench_lazy.py"
Benchmark #1: python benches/bench_eager.py
  Time (mean ± σ):     10.803 s ±  0.057 s    [User: 10.300 s, System: 0.438 s]
  Range (min … max):   10.689 s … 10.899 s    10 runs

Benchmark #2: python benches/bench_lazy.py
  Time (mean ± σ):     10.637 s ±  0.049 s    [User: 10.169 s, System: 0.408 s]
  Range (min … max):   10.548 s … 10.715 s    10 runs

Summary
  'python benches/bench_lazy.py' ran
    1.02 ± 0.01 times faster than 'python benches/bench_eager.py'

madig commented 3 years ago

Speedscope profile for bench_eager.py: out.txt

31% of runtime is spent inside the Rust module, below which 12% is spent loading glifs in norad and 18% is spent converting glyphs (12% on points) to Python dictionaries.
22% of runtime is spent in ufoLib2, re-instantiating objects out of the passed over dicts, below which 14% is spent instantiating Points.
18% is spent in rebuildContents in ufoLib
7% is spent in readLayerInfo in ufoLib
6% is spent in readLib in ufoLib
2% is spent in readGroups in ufoLib
1% is spent in readKerning in ufoLib

Put differently, of the 4.45s total runtime for loading all layers eagerly in Rust, 1.8s are spent converting norad glyphs to Python dicts and then re-instantiating them on the ufoLib2 side (1.16s on points alone).

madig commented 3 years ago

When commenting out file existence checking in rebuildContents, the measurements improve by a second for Rust:

Rusty:

$ hyperfine --warmup 3 "python benches/bench_eager.py" "python benches/bench_lazy.py"
Benchmark #1: python benches/bench_eager.py
  Time (mean ± σ):      3.582 s ±  0.041 s    [User: 6.911 s, System: 0.516 s]
  Range (min … max):    3.524 s …  3.655 s    10 runs

Benchmark #2: python benches/bench_lazy.py
  Time (mean ± σ):      4.747 s ±  0.071 s    [User: 4.438 s, System: 0.281 s]
  Range (min … max):    4.680 s …  4.884 s    10 runs

Summary
  'python benches/bench_eager.py' ran
    1.32 ± 0.03 times faster than 'python benches/bench_lazy.py'

Vanilla ufoLib2:

$ hyperfine --warmup 3 "python benches/bench_eager.py" "python benches/bench_lazy.py"
Benchmark #1: python benches/bench_eager.py
  Time (mean ± σ):      9.831 s ±  0.112 s    [User: 9.418 s, System: 0.354 s]
  Range (min … max):    9.682 s … 10.006 s    10 runs

Benchmark #2: python benches/bench_lazy.py
  Time (mean ± σ):     10.065 s ±  0.230 s    [User: 9.635 s, System: 0.367 s]
  Range (min … max):    9.814 s … 10.485 s    10 runs

Summary
  'python benches/bench_eager.py' ran
    1.02 ± 0.03 times faster than 'python benches/bench_lazy.py'

madig commented 3 years ago

Cutting out ufoLib's GlyphSet shaves off a second:

$ hyperfine --warmup 3 "python benches/bench_eager.py" "python benches/bench_lazy.py"
Benchmark #1: python benches/bench_eager.py
  Time (mean ± σ):      3.348 s ±  0.060 s    [User: 6.364 s, System: 0.528 s]
  Range (min … max):    3.275 s …  3.458 s    10 runs

Benchmark #2: python benches/bench_lazy.py
  Time (mean ± σ):      4.776 s ±  0.091 s    [User: 4.449 s, System: 0.292 s]
  Range (min … max):    4.661 s …  4.981 s    10 runs

Summary
  'python benches/bench_eager.py' ran
    1.43 ± 0.04 times faster than 'python benches/bench_lazy.py'

madig commented 3 years ago

Comparing against my WIP iondrive branch:

$ hyperfine --warmup 3 "python benches/bench_eager.py" "python benches/bench_lazy.py"
Benchmark #1: python benches/bench_eager.py
  Time (mean ± σ):      3.175 s ±  0.016 s    [User: 5.889 s, System: 0.601 s]
  Range (min … max):    3.151 s …  3.197 s    10 runs

Benchmark #2: python benches/bench_lazy.py
  Time (mean ± σ):      3.774 s ±  0.043 s    [User: 6.564 s, System: 0.487 s]
  Range (min … max):    3.739 s …  3.880 s    10 runs

Summary
  'python benches/bench_eager.py' ran
    1.19 ± 0.01 times faster than 'python benches/bench_lazy.py'

I assume the lazy approach takes a bit more time because it is iterating through all glyphs. Norad knows no lazy loading. iondrive isn't complete though, e.g. transferring a layer lib is missing.

madig / readwrite-ufo-glif

Loading performance pt. 2 #2