vsbuffalo / granges

A Rust library and command line tool for working with genomic ranges and their data.
90 stars 5 forks source link

Slow parsers #5

Open vsbuffalo opened 4 months ago

vsbuffalo commented 4 months ago

flamegraph

sudo cargo flamegraph --bin granges -- map  --genome tests_data/hg38_seqlens.tsv \
  --left windows_1Mb.bed  --right test_bed5.bed.gz --func mean > /dev/null

GRange's parsers are slow-ish. I got a 20-25% gain in speed from using serde + csv. But, I think String types are killing us, compared to raw bytes. But to materialize those benefits a new ASCII or raw byte vector type is need through out (I think noodles uses a similar approach?).

vsbuffalo commented 4 months ago

cd05f1f brought in serde+csv.

This, from an API perspective is much cleaner — users can just specify structs and use serde's Deserialize derive attribute to handle parsing.

However, comparing against here appears to be a performance hit. Here is f144ab41a976d693131a68709fa561385cc7a6a8, but with the benches/bedtools_comparison.rs from HEAD:

command       bedtools time    granges time      granges speedup (%)
------------  ---------------  --------------  ---------------------
map_multiple  139.37 s         67.22 s                       51.7679
adjust        60.24 s          29.68 s                       50.7238
map_min       66.54 s          45.44 s                       31.7153
map_mean      65.98 s          45.53 s                       30.9976
map_max       72.54 s          45.03 s                       37.9216
map_sum       64.87 s          45.21 s                       30.3143
map_median    65.95 s          46.16 s                       30.012
flank         83.87 s          47.29 s                       43.6118
filter        78.89 s          39.74 s                       49.6282
windows       280.89 s         47.56 s                       83.0676

Here are two runs on HEAD:

# run 1
command       bedtools time    granges time      granges speedup (%)
------------  ---------------  --------------  ---------------------
map_multiple  134.51 s         73.06 s                      45.6827
adjust        59.98 s          65.52 s                      -9.23242
map_min       66.54 s          54.71 s                      17.7839
map_mean      65.75 s          55.76 s                      15.2025
map_max       67.37 s          55.96 s                      16.942
map_sum       64.78 s          54.68 s                      15.5909
map_median    66.60 s          54.59 s                      18.0371
flank         84.39 s          31.96 s                      62.1299
filter        78.31 s          41.01 s                      47.6287
windows       281.53 s         149.98 s                     46.7274

# run 2 
command       bedtools time    granges time      granges speedup (%)
------------  ---------------  --------------  ---------------------
map_multiple  137.36 s         73.51 s                      46.4823
adjust        59.91 s          65.41 s                      -9.19381
map_min       66.01 s          54.51 s                      17.4298
map_mean      66.05 s          57.25 s                      13.3131
map_max       69.62 s          54.67 s                      21.4671
map_sum       64.61 s          54.60 s                      15.4947
map_median    66.99 s          55.53 s                      17.1005
flank         84.60 s          32.04 s                      62.1338
filter        78.90 s          41.21 s                      47.7687
windows       283.60 s         150.69 s                     46.8675

So far, it looks like serde's deserialization lead to speed ups, but the serialization (or maybe csv) is slower.

Making windows is a fast operation in absolute terms, so this matters little... but one can see the cost of serde deserialize versus the old TsvSerialize approach here:

Screenshot 2024-02-28 at 11 56 15 PM
vsbuffalo commented 4 months ago

Updates: this is on f991e23277a8557a10a447c0d7af118c4847cb83 which may disappear as it's squashed etc.

python scripts/benchmark_summary.py
command       bedtools time    granges time      granges speedup (%)
------------  ---------------  --------------  ---------------------
map_median    110.73 s         95.96 s                       13.3354
map_sum       108.87 s         95.72 s                       12.0726
map_max       114.47 s         84.39 s                       26.2822
adjust        109.28 s         80.88 s                       25.9942
flank         145.60 s         50.54 s                       65.2901
map_multiple  296.48 s         119.34 s                      59.7466
map_mean      109.97 s         94.83 s                       13.7727
filter        118.36 s         58.79 s                       50.3318
merge_empty   63.51 s          31.01 s                       51.181
windows       515.97 s         173.53 s                      66.3682
map_min       114.82 s         94.74 s                       17.4876

with --features=bench-big

python scripts/benchmark_summary.py
command       bedtools time    granges time      granges speedup (%)
------------  ---------------  --------------  ---------------------
map_median    524.46 s         547.00 s                     -4.29831
map_sum       510.89 s         539.91 s                     -5.67975
map_max       505.62 s         535.22 s                     -5.85555
adjust        968.14 s         696.75 s                     28.0319
flank         22.22 min        397.84 s                     70.1543
map_multiple  21.01 min        577.71 s                     54.1614
map_mean      502.51 s         540.29 s                     -7.51975
filter        20.25 min        641.71 s                     47.1769
merge_empty   447.83 s         210.99 s                     52.8856
windows       519.41 s         172.34 s                     66.8201
map_min       503.93 s         538.19 s                     -6.79718

So parsing is slow, but something in particular isn't scaling well.