samtools / htslib

C library for high-throughput sequencing data formats
Other
789 stars 447 forks source link

Slightly speed up various cram decoding functions #1580

Closed jkbonfield closed 1 year ago

jkbonfield commented 1 year ago

None of this is huge, but it all adds up.

    // Old          time   inst        cyc
    // gcc -O2      12.36  78936832183 36853852204
    // gcc -O3      12.37  78713347525 36867027825
    // clang13 -O2  12.43  77451926728 37012866717
    // clang13 -O3  12.32  77627221907 36691623424
    // gcc12 -O2    12.43  78895089091 37081260172
    // gcc12 -O3    12.36  78505904437 36829216967

    // New
    // gcc -O2      12.47  78832021505 37200597109 +
    // gcc -O3      12.14  76499369401 36390334338 --
    // clang13 -O2  12.38  76678460761 36920111561 ~
    // clang13 -O3  12.26  76678023071 36548488492 ~
    // gcc12 -O2    12.38  78581694397 36880034181 -
    // gcc12 -O3    12.15  76356625541 36293921439 --

Combined before and after on 10 million NovaSeq CRAM (v3.1)

epyc 7543

               before   after
gcc(7)  -O2    7.67     7.63   -0.5%
gcc12   -O2    7.59     7.60   +0.1%
clang7  -O2    8.12     7.57   -6.8%
clang13 -O2    8.06     7.54   -6.5%

gcc(7)  -O3    7.73     7.46   -3.5%
gcc12   -O3    7.46     7.35   -1.5%
clang7  -O3    8.08     7.57   -6.3%
clang13 -O3    7.95     7.66   -3.6%

Xeon Gold 6142

               before   after
gcc(7)  -O2    9.74     9.14   -6.2%
gcc12   -O2    9.43     8.45  -10.4%
clang7  -O2    9.61     8.64  -10.0%
clang13 -O2    9.95     8.85  -11.1%

gcc(7)  -O3    9.51     8.81   -7.4%
gcc12   -O3    9.15     8.42   -8.0%
clang7  -O3    9.92     8.72  -12.1%
clang13 -O3    9.68     8.91   -8.0%

Biggest change is with clang, but also on Intel we see bigger changes than AMD too.

jkbonfield commented 1 year ago

Extra data for other data sets (including duplicating Novaseq data from above). I stuck with a clang 13 -O2 and one CPU rather than testing everything, as that combination seemed both realistic and showed a considerable benefit. Pleasing to see it applies well on other data too.

Xeon Gold 6142, clang13 -O2, diff data sets
novaseq        9.95 8.85  -11.1%
revio         23.33    19.46  -16.6%
ultima       195.20   177.71   -9.0%
ONT       68.27    60.67  -11.1%
jkbonfield commented 1 year ago

Working on fixing it! Turns out my trivial 2 line SAM file for testing wasn't exactly enough. :/