noporpoise / seq-align

Fast, portable C implementations of Needleman-Wunsch and Smith-Waterman sequence alignment
94 stars 39 forks source link

MAX4 macro #5

Closed arosenfeld closed 8 years ago

arosenfeld commented 8 years ago

The max4 function appears to take a large amount of time for a simple operation. With an input file containing 6,596 alignments, running:

time ./bin/needleman_wunsch --file test.fasta --wildcard N 1 --printfasta

Gives:

43.66s user 0.01s system 99% cpu 43.668 total

Gprof shows:

  %   cumulative   self              self     total           
 time   seconds   seconds    calls   s/call   s/call  name    
 45.31     12.70    12.70     6596     0.00     0.00  alignment_fill_matrices
 22.41     18.98     6.28 2617031170     0.00     0.00  max4
 19.45     24.43     5.45 876113280     0.00     0.00  scoring_lookup
 10.06     27.25     2.82 876113280     0.00     0.00  _scoring_check_wildcards
  1.96     27.80     0.55        1     0.55    27.88  align_from_file
  0.50     27.94     0.14        1     0.14     0.14  scoring_add_wildcard
  0.25     28.01     0.07  2134272     0.00     0.00  alignment_reverse_move
...

So a bit under half the total time is spent in that function. After this change the same command gives:

26.39s user 0.01s system 99% cpu 26.408 total

noporpoise commented 8 years ago

Wow, that's surprising. Thanks v much Aaron.

At one point I used a MAX4 macro but removed it since it (in theory) doubly evaluated MAX3(x,y,z) in MAX2. I guess I didn't trust the compiler to spot the double evaluation. Now I feel foolish. Making max4() function static inline may also have given the same speed up. There's a lesson here about profiling before optimising.

Merged.