madig / readwrite-ufo-glif

A reader and writer for Unified Font Object glif files for Python, written in Rust for speed
Apache License 2.0
2 stars 0 forks source link

Loading performance pt. 2 #2

Open madig opened 3 years ago

madig commented 3 years ago

Using a glyphs2ufo'd Noto Sans from https://github.com/googlefonts/noto-source/tree/d19e3db5ab7f87bfab30b8ecf68601fd81521539.

Lazy loading loads single glyphs on demand, eager loading loads whole layers up front (using parallel glif file loading in norad with rayon).

Rusty:

$ hyperfine --warmup 3 "python benches/bench_eager.py" "python benches/bench_lazy.py" 
Benchmark #1: python benches/bench_eager.py
  Time (mean ± σ):      4.457 s ±  0.022 s    [User: 7.619 s, System: 0.565 s]
  Range (min … max):    4.418 s …  4.485 s    10 runs

Benchmark #2: python benches/bench_lazy.py
  Time (mean ± σ):      5.666 s ±  0.027 s    [User: 5.294 s, System: 0.339 s]
  Range (min … max):    5.631 s …  5.704 s    10 runs

Summary
  'python benches/bench_eager.py' ran
    1.27 ± 0.01 times faster than 'python benches/bench_lazy.py'

Vanilla ufoLib2, using fontTools.ufoLib:

$ hyperfine --warmup 3 "python benches/bench_eager.py" "python benches/bench_lazy.py"
Benchmark #1: python benches/bench_eager.py
  Time (mean ± σ):     10.803 s ±  0.057 s    [User: 10.300 s, System: 0.438 s]
  Range (min … max):   10.689 s … 10.899 s    10 runs

Benchmark #2: python benches/bench_lazy.py
  Time (mean ± σ):     10.637 s ±  0.049 s    [User: 10.169 s, System: 0.408 s]
  Range (min … max):   10.548 s … 10.715 s    10 runs

Summary
  'python benches/bench_lazy.py' ran
    1.02 ± 0.01 times faster than 'python benches/bench_eager.py'
madig commented 3 years ago

Speedscope profile for bench_eager.py: out.txt

Put differently, of the 4.45s total runtime for loading all layers eagerly in Rust, 1.8s are spent converting norad glyphs to Python dicts and then re-instantiating them on the ufoLib2 side (1.16s on points alone).

madig commented 3 years ago

When commenting out file existence checking in rebuildContents, the measurements improve by a second for Rust:

Rusty:

$ hyperfine --warmup 3 "python benches/bench_eager.py" "python benches/bench_lazy.py"
Benchmark #1: python benches/bench_eager.py
  Time (mean ± σ):      3.582 s ±  0.041 s    [User: 6.911 s, System: 0.516 s]
  Range (min … max):    3.524 s …  3.655 s    10 runs

Benchmark #2: python benches/bench_lazy.py
  Time (mean ± σ):      4.747 s ±  0.071 s    [User: 4.438 s, System: 0.281 s]
  Range (min … max):    4.680 s …  4.884 s    10 runs

Summary
  'python benches/bench_eager.py' ran
    1.32 ± 0.03 times faster than 'python benches/bench_lazy.py'

Vanilla ufoLib2:

$ hyperfine --warmup 3 "python benches/bench_eager.py" "python benches/bench_lazy.py"
Benchmark #1: python benches/bench_eager.py
  Time (mean ± σ):      9.831 s ±  0.112 s    [User: 9.418 s, System: 0.354 s]
  Range (min … max):    9.682 s … 10.006 s    10 runs

Benchmark #2: python benches/bench_lazy.py
  Time (mean ± σ):     10.065 s ±  0.230 s    [User: 9.635 s, System: 0.367 s]
  Range (min … max):    9.814 s … 10.485 s    10 runs

Summary
  'python benches/bench_eager.py' ran
    1.02 ± 0.03 times faster than 'python benches/bench_lazy.py'
madig commented 3 years ago

Cutting out ufoLib's GlyphSet shaves off a second:

$ hyperfine --warmup 3 "python benches/bench_eager.py" "python benches/bench_lazy.py"
Benchmark #1: python benches/bench_eager.py
  Time (mean ± σ):      3.348 s ±  0.060 s    [User: 6.364 s, System: 0.528 s]
  Range (min … max):    3.275 s …  3.458 s    10 runs

Benchmark #2: python benches/bench_lazy.py
  Time (mean ± σ):      4.776 s ±  0.091 s    [User: 4.449 s, System: 0.292 s]
  Range (min … max):    4.661 s …  4.981 s    10 runs

Summary
  'python benches/bench_eager.py' ran
    1.43 ± 0.04 times faster than 'python benches/bench_lazy.py'
madig commented 3 years ago

Comparing against my WIP iondrive branch:

$ hyperfine --warmup 3 "python benches/bench_eager.py" "python benches/bench_lazy.py"
Benchmark #1: python benches/bench_eager.py
  Time (mean ± σ):      3.175 s ±  0.016 s    [User: 5.889 s, System: 0.601 s]
  Range (min … max):    3.151 s …  3.197 s    10 runs

Benchmark #2: python benches/bench_lazy.py
  Time (mean ± σ):      3.774 s ±  0.043 s    [User: 6.564 s, System: 0.487 s]
  Range (min … max):    3.739 s …  3.880 s    10 runs

Summary
  'python benches/bench_eager.py' ran
    1.19 ± 0.01 times faster than 'python benches/bench_lazy.py'

I assume the lazy approach takes a bit more time because it is iterating through all glyphs. Norad knows no lazy loading. iondrive isn't complete though, e.g. transferring a layer lib is missing.