rhpvorderman / sequali

Fast sequencing data quality metrics
GNU Affero General Public License v3.0
11 stars 0 forks source link

Use nucleotide to index to simplify code and use smaller bitmatrices #6

Closed rhpvorderman closed 1 year ago

rhpvorderman commented 1 year ago

Checklist

Incidentally this also improves performance by 7%. We need to do one extra table lookup, but the bitmatrix lookup becomes very cheap as it all fits on the same cache line. The NUCLEOTIDE_TO_INDEX lookup is also cheap, since it is only 128 bytes wide and thus fits on two cachelines. All alphabetic characters should sit on the same cache line when using ASCII, so again, lookup is cheap. The lookup of two arrays that fit in one cache line is apparently cheaper than the cost of one lookup into a 1KB array.

This might be different on other machines, but this also has the benefit of less dynamically allocated memory. And since there is a slight positive in the performance area, I think it is good to merge this.