primer3-org / primer3

Primer3 is a command line tool to select primers for polymerase chain reaction (PCR).
GNU General Public License v2.0
218 stars 62 forks source link

Initialize thermodynamic parameter tables to default values #76

Open DougTownsend opened 4 months ago

DougTownsend commented 4 months ago

What this change does:

Initialize thermodynamic parameters to the default values, avoiding the need to do any string parsing when defaults are used. I replaced thal_parameters.c/h and thal_parameters_c_create.pl with thal_default_params.h and thal_default_params_create.py. The python script creates the array initializers by constructing the lookup tables from the config files in the same way that get_thermodynamic_values() does. Now get_thermodynamic_values() is only called if the user provides their own config file path.

Additionally, I removed two out of bounds array accesses in readTLoop() in thal.c. The loop fields of the struct triloop and struct tetraloop are lengths 5 and 6 respectively, but in readTLoop(), loop[5] for triloop and loop[6] for tetraloop were being set to null. This did not cause any problems because, on my computer at least, the struct fields were aligned to an 8 byte boundary, so there were two or three bytes of padding between the loop and value fields. The null characters are not needed anyway because the loop field is never used by any functions that expect a null terminated string, and it is converted into integer values immediately after having the string representation copied into it. Even though the out of bounds accesses were not causing any problems I think it is best to remove them.

Why this change should be accepted:

This mainly would be a performance improvement for anyone repeatedly calling ntthalon pairs of short sequences from a script, since it only removes a fixed amount of work done at startup. This could also allow for better compiler optimization in the future with additional changes to thal().

Tests:

All tests pass when make test is run.

I compared the initialized tables to the tables loaded by get_thermodynamic_values() and there are no differences.

I tested the performance improvement by passing random sequences into ntthal. Each test used 10000 pairs of sequences, and called ntthal separately for each pair. Seq1 length = 20, seq2 length = 20: Reduced execution time by ~20% Seq1 length = 20, seq2 length = 100: Reduced execution time by ~10% Seq1 length = 20, seq2 length = 1000: Reduced execution time by ~1%

Closing:

Please let me know if there is anything you would like for me to change, or if you would like any further explanation in order to accept this PR.