primer3-org / primer3

Primer3 is a command line tool to select primers for polymerase chain reaction (PCR).
GNU General Public License v2.0
222 stars 64 forks source link

Major Prformance Improvements for thal #78

Closed DougTownsend closed 2 months ago

DougTownsend commented 6 months ago

Note

This PR includes the changes from #76. If those changes are not accepted I can modify this PR to be based on the current release of primer3. The changes specific to this PR can be seen here.

What These Changes Do

The main performance gains come from removing a lot of redundant work within thal. These changes provide a roughly 2x speedup when calling thal. The output of thal is not changed, so it calculates the exact same result in half the time.

For dimer alignments, thal spent about half of its runtime in RSH called by calc_bulge_internal, but the entropy and enthalpy that RSH was calculating was already calculated in fill_matrix before calling calc_bulge_internal. Now the precomputed entropy and enthalpy is passed to calc_bulge_internal removing the need to call RSH.

For monomer alignments, thal spent most of its runtime in calc_terminal_bp which calculated entropy and enthalpy using the END5_1-4 functions. These functions compute both the entropy and enthalpy of a stack, but then only returns one of those values. So in order to get both the entropy and the enthalpy, END5_X would need to be called twice, and both values would be computed twice. I removed the END5_X functions entirely and inlined their functionality into calc_terminal_bp.

Additionally, RSH and LSH took an argument double *EntropyEnthalpy that was used to pass the computed entropy and enthalpy values back to the caller. For some reason the functions that called RSH and LSH would declare a double *SH and then use malloc to allocate space for two double. Since SH was being used as a local variable in these functions, I changed it to just use a local array double SH[2]. This removes a lot of unnecessary calls to malloc in deeply nested loops.

In order to find these problems I needed to make a lot of structural changes to the code. The original code used a lot of macros and global variables which made it hard to tell where state was being modified. I removed almost all of the macros and reduced the scope of all non const global variables except for the thermodynamic parameter tables. With these changes, any state variables are passed as arguments to the functions that need them. Any state variables that are passed by reference but are not modified by the callee are now passed as const. This makes it more clear where the state is being modified, and also makes it so that, as long as all threads are using the same thermodynamic parameters, thal is now thread safe. This allows users to call thal from within their own code by just pasting thal.c, thal.h, and thal_default_params.h into their project.

Additional small changes:

Tests

All make test and valgrind tests pass. On my computer, the valgrind tests went from taking 8660 seconds to 4524 seconds with these changes. So these changes will make testing future PRs much faster.

Performance gains were measured by running thal on 1 million randomly generated sequences or pairs of sequences for the following conditions:

All three of these inputs had a ~2x speedup. These performance gains do not include any of the gains from #76 since I initialized the thermodynamic parameters before starting timing, and timed calls to thal with the generated sequences only.

Closing

Please let me know if you need me to make any modifications in order to accept these changes. I can provide further explanation of the changes I made if necessary. I understand that this PR makes a lot of changes, but I think that it will make future improvements much easier.

untergasser commented 2 months ago

Thank you for your excellent contribution. It took me 8 hours to go trough you commits - but it was great as each step was perfectly described and the reasoning was clear. Runtime of the ./paranoid_tests.sh went down from 4:50 to 2:55 on my machine. This is impressive.

Best, Andreas