rnabioco / raer

Characterize A-to-I RNA editing in bulk and single-cell RNA sequencing experiments
https://rnabioco.github.io/raer/
Other
7 stars 2 forks source link

speed up `remove_multiallelic()`? #60

Closed jayhesselberth closed 1 year ago

jayhesselberth commented 1 year ago

I wonder if this:

https://github.com/rnabioco/raer/blob/c385b24b3fa2f8079c1fac3a5625c7a9c2144e39/R/filter_se.R#L24

would be faster as:

stringr::str_detect(x, ",", negate = TRUE)
jayhesselberth commented 1 year ago

str_detect() is slower, but grepl() is almost 1.5-fold faster.

# A tibble: 3 × 13
  expression                                  min median itr/s…¹ mem_a…² gc/se…³ n_itr  n_gc total…⁴ result         memory     time      
  <bch:expr>                              <bch:t> <bch:>   <dbl> <bch:b>   <dbl> <int> <dbl> <bch:t> <list>         <list>     <list>    
1 remove_multiallelic(mm_rse_filt)          3.42s  3.42s   0.293   187MB    1.17     1     4   3.42s <RngdSmmE[,6]> <Rprofmem> <bench_tm>
2 remove_multiallelic_detect(mm_rse_filt)   4.37s  4.37s   0.229   187MB    1.15     1     5   4.37s <RngdSmmE[,6]> <Rprofmem> <bench_tm>
3 remove_multiallelic_grepl(mm_rse_filt)    2.02s  2.02s   0.494   132MB    1.48     1     3   2.02s <RngdSmmE[,6]> <Rprofmem> <bench_tm>
# … with 1 more variable: gc <list>, and abbreviated variable names ¹​`itr/sec`, ²​mem_alloc, ³​`gc/sec`, ⁴​total_time
# ℹ Use `colnames()` to see all variable names