Closed peterjc closed 7 months ago
I may be doing something (else) wrong, my modified code seems to run under v1.0.0 or v2.0.2, but is horribly slow. Hopefully you can confirm if the above change looks OK in principle, or not?
Good catch, definitely a miss from a documentation perspective. Super odd that the code runs slowly on v1.0.0
and v2.0.0
, I don't think we made any changes that would impact the inbuilt primer design engine performance. We made some of the lower level thermodynamic calls threadsafe but I don't think that should have had any side effects.
Can you share any profiling data between v0.6.x
and one of the newer versions?
cc @grinner
It is weird based on my development testing that performance would decrease from 1.x to version 2.x
Any code and profiling you can provide @peterjc is welcome to help us understand your issue better.
The performance drop off I am seeing is from v0.6.1 and seems to be associated with the setGlobals
change.
Test case adapted from something I wrote using primer3-py 0.6.1 in 2022.
The target sequence is a subset of the reference FASTA file, and I didn't want the primers to bind elsewhere in the genome (but this may not have actually worked - there was a lot more filtering downstream).
Example bacterial genome from https://www.ncbi.nlm.nih.gov/nuccore/NC_004547.2 is 5 million bp, with the target sequence a region within that genome. The intent here was to design primers targeting that region, but not amplifying anywhere else in that genome, and thus using PRIMER_MISPRIMING_LIBRARY
to point at the genome.
import time
from primer3 import __version__ # pip install primer3-py
from primer3.bindings import designPrimers
try:
from primer3.primerdesign import setGlobals
except ImportError:
setGlobals = None
ref_fasta = "NC_004547.2.fa"
seq = """\
GATTGATTGAAGTCCAGGCACCGATTCTCAGCCGTATCGGCGATGGCACACAGGATAACTTGTCTGGTAC
AGAAAAAGCGGTGCAGGTTAAGGTAAAAGCCTTACCGGATGCCACATTTGAAGTGGTGCACTCACTGGCA
AAATGGAAACGCAAAACGCTGGGCGCGTATGATTTTAGCTTCGGTGAAGGCATTTATACTCACATGAAGG
CGCTGCGTCCGGACGAAGATCGCCTGAGCCCGATCCACTCGGTTTATGTCGATCAGTGGGATTGGGAGCG
CGTGATGGAGGATGGCGAGCGCAATGCTGAATATCTGAAATCGACGGTCACGCGTATTTATCAAGGCATT
AAAGCGACTGAGGCTGCGGTACATCAGGCGTTTGGCATTCAGCCTTTCCTGCCAGAGCAGATTCATTTTG
TGCATACCGAAACCTTGCTGAAGCGTTATCCCGATCTGGACGCCAAAGGGCGTGAGCGAGCTATAGCTAA
AGAGCTGGGTGCGGTCTTCCTGATTGGGATTGGCGGTAAGCTGTCCAGCGGACACTCTCACGATGTGCGT
GCACCGGATTATGATGACTGGACGACACCAGGCGAGCAGGAATTGGCGGGTTTGAACGGCGATATCGTCG
TCTGGAACCCAGTCCTGAACGATGCGTTTGAGATTTCATCCATGGGTATCCGCGTGGACGCAGAGGCGCT
AACACGCCAACTGGCACTGACGCAGGATGAGGAACGTCTGAAGCTTGAATGGCATCAGGCGCTGCTGCGC
""".replace("\n", "")
GLOBALS = {
"PRIMER_OPT_SIZE": 20,
"PRIMER_MIN_SIZE": 18,
"PRIMER_MAX_SIZE": 25,
"PRIMER_PRODUCT_SIZE_RANGE": [[200, 250], [250, 300], [150, 200], [100, 150], [70, 100]],
"PRIMER_NUM_RETURN": 50,
"PRIMER_MISPRIMING_LIBRARY": ref_fasta,
"PRIMER_LIB_AMBIGUITY_CODES_CONSENSUS": 0, # we do have NNNNN in the scaffolds
}
seq_args = {"SEQUENCE_ID": "XXX", "SEQUENCE_TEMPLATE": seq}
print("Using empty dicts for misprime_lib and mishyb_lib")
start = time.time()
if setGlobals is None:
print(f"primer3 {__version__} - can't use setGlobals")
try:
candidates = designPrimers(seq_args, GLOBALS, {}, {})
except OSError:
print("ERROR")
candidates = []
else:
print(f"primer3 {__version__} - using setGlobals")
setGlobals(GLOBALS, {}, {})
candidates = designPrimers(seq_args)
taken = time.time() - start
print(f"primer3 {__version__} with empty dicts gave {len(candidates)} in {round(taken,2)} seconds")
print("Using None for misprime_lib and mishyb_lib")
start = time.time()
if setGlobals is None:
print(f"primer3 {__version__} - can't use setGlobals")
# Could leave out misprime_lib and mishyb_lib as default is None
candidates = designPrimers(seq_args, GLOBALS, None, None)
else:
print(f"primer3 {__version__} - using setGlobals")
setGlobals(GLOBALS, None, None)
candidates = designPrimers(seq_args)
taken = time.time() - start
print(f"primer3 {__version__} with None gave {len(candidates)} in {round(taken,2)} seconds")
Usage:
$ python --version
Python 3.10.12
$ for V in 0.6.1 1.0.0 1.1.0 1.2.2 2.0.2; do pip install --quiet primer3-py==$V; python example.py ; done
...
Pertinent highlights on macOS using empty dicts for the misprime_lib and mishyb_lib settings, a memory error which has been fixed, similar results in terms of counts, but still a big slow down:
OSError: Could not allocate memory for seq_lib
)Pertinent highlights on macOS using None for the misprime_lib and mishyb_lib settings, far fewer primers and an enormous slowdown:
Same example on a Linux machine with Python 3.11 which was installed via conda:
And:
i.e. Much the same as the macOS Python 3.10 tests:
The empty dicts approach when it doesn't out-of-memory error seems to give broadly comparable numbers of results, but takes over x50 longer.
Using None for the misprime_lib and mishyb_lib settings gives radically different results, and takes far far longer still.
Thanks for digging into this in such detail @peterjc. It looks like there are two issues at play:
v1.0.0
primer3-py
did not support the PRIMER_MISPRIMING_LIBRARY
argument (same with PRIMER_INTERNAL_MISHYB_LIBRARY
). I don't think this was well documented, but you had to provide the mispriming library as a dictionary argument to setGlobals
. So in your v0.6.1
profiling, primer3
isn't loading the FASTA file or performing mispriming checks against it. You can see that the run time of v1.0.0
is the same as v0.6.1
if you comment out the "PRIMER_MISPRIMING_LIBRARY": ref_fasta,
line in your test script.None
as the misprime_lib
and/or mishyb_lib
. This bug only had consequences if either PRIMER_INTERNAL_MISHYB_LIBRARY
or PRIMER_MISPRIMING_LIBRARY
was set as well (so this particular circumstance). @grinner fixed this bug in #135 (which should be merged and released soon, after some test coverage is added)@peterjc looks like this was autoclosed when I merged #133. I'm reopening it so you can (hopefully) confirm that the fix rolled into v2.0.3
addresses the above performance issue.
Excellent. I suspected the PRIMER_MISPRIMING_LIBRARY
value might not be working, but as I was doing a bunch of filtering downstream so that didn't actually matter.
I'll retest with v2.0.3 shortly and confirm my full script is back to v0.6.1 levels of performance. Thank you!
Just to clarify one of my previous comments -- I would still expect your test script to take on the order of seconds w/ v2.0.3
. The reason it takes < 1s
with v0.6.1
is because pre-v1.0.0
, primer3-py
didn't support the PRIMER_MISPRIMING_LIBRARY
parameter at all (so you would have to parse the FASTA file yourself and then construct a dictionary like {'NC_004547.2': '<sequence>'}
and pass it into setGlobals
as the misprime_lib
kwarg. So on v0.6.1
there is no FASTA loading and no mispriming checks, whereas on v2.0.3
there is now both FASTA loading/mispriming checks AND no issues with supplying None
as the value for misprime_lib
and/or mishyb_lib
.
My initial plan is not to set PRIMER_MISPRIMING_LIBRARY
(since I understand this did nothing on v0.6.1 which is my baseline), confirm no major difference with v0.6.1, and then see if v2.0.3 will give broadly the same results in broadly the same time.
Confirming expected behaviour in a modified version of this test case not setting PRIMER_MISPRIMING_LIBRARY
, here on macOS:
$ for V in 0.6.1 1.0.0 1.1.0 1.2.2 2.0.2 2.0.3 ; do pip install --quiet primer3-py==$V; python example2.py ; done
primer3 0.6.1 with setGlobals(GLOBALS, None, None), designPrimers(seq_args) gave 1107 in 0.1 seconds
primer3 1.0.0 with design_primers(seq_args, GLOBALS) gave 1607 in 0.1 seconds
primer3 1.1.0 with design_primers(seq_args, GLOBALS) gave 1607 in 0.1 seconds
primer3 1.2.2 with design_primers(seq_args, GLOBALS) gave 1607 in 0.06 seconds
primer3 2.0.2 with design_primers(seq_args, GLOBALS) gave 1161 in 0.06 seconds
primer3 2.0.3 with design_primers(seq_args, GLOBALS) gave 1161 in 0.05 seconds
It looks like this actually got slightly faster at some point too 👍
And example2.py is:
Making the equivalent change in my original primer design script should be a formality now - closing issue:
setGlobals
None
with design_primers
Thank you both!
I have some code last used in June 2022 which used primer3-py v0.6.1, specifically the following imports:
The first line is easy enough, v1.0.0 deprecated the camelCase names and replaced them, and then the camelCase names were dropped in v2.0.0 (see #132 about making this more explicit):
The second import is more tricky. Moving from v0.6.1 to v1.0.0, even
import primer3.primerdesign
no longer works. I can seesetGlobals
was removed as part of #69, the only other clue is the CHANGE entry:I believe the expectation is to use something like this:
And similarly if using
misprime_lib
and/ormishyb_lib
.