libnano / primer3-py

Simple oligo analysis and primer design
https://libnano.github.io/primer3-py
GNU General Public License v2.0
157 stars 43 forks source link

Replacement for primer3.primerdesign.setGlobals unclear #133

Closed peterjc closed 7 months ago

peterjc commented 7 months ago

I have some code last used in June 2022 which used primer3-py v0.6.1, specifically the following imports:

from primer3.bindings import designPrimers
from primer3.primerdesign import setGlobals

# I only used the first positional argument, but all 3 are required:
#  -  global_args: dictionary of Primer3 args
#  -  misprime_lib: mispriming library dictionary
#  -  mishyb_lib: mishybridization library dictionary
setGlobals({...}, {}, {})

sequence = "..."
seq_args = {"SEQUENCE_ID": "XXX", "SEQUENCE_TEMPLATE": sequence}
results = designPrimers(seq_args)

The first line is easy enough, v1.0.0 deprecated the camelCase names and replaced them, and then the camelCase names were dropped in v2.0.0 (see #132 about making this more explicit):

from primer3.bindings import design_primers as designPrimers  # minimal change

The second import is more tricky. Moving from v0.6.1 to v1.0.0, even import primer3.primerdesign no longer works. I can see setGlobals was removed as part of #69, the only other clue is the CHANGE entry:

  • Call signature update and pattern update: designPrimers now design_primers integrates setting of sequence arguments and global arguments as they are coupled in primer3_boulder_main.c

I believe the expectation is to use something like this:

from primer3.bindings import design_primers

GLOBALS = {...}

sequence = "..."
seq_args = {"SEQUENCE_ID": "XXX", "SEQUENCE_TEMPLATE": sequence}
results = design_primers(seq_args, global_args=GLOBALS)

And similarly if using misprime_lib and/or mishyb_lib.

peterjc commented 7 months ago

I may be doing something (else) wrong, my modified code seems to run under v1.0.0 or v2.0.2, but is horribly slow. Hopefully you can confirm if the above change looks OK in principle, or not?

benpruitt commented 7 months ago

Good catch, definitely a miss from a documentation perspective. Super odd that the code runs slowly on v1.0.0 and v2.0.0, I don't think we made any changes that would impact the inbuilt primer design engine performance. We made some of the lower level thermodynamic calls threadsafe but I don't think that should have had any side effects.

Can you share any profiling data between v0.6.x and one of the newer versions?

cc @grinner

grinner commented 7 months ago

It is weird based on my development testing that performance would decrease from 1.x to version 2.x

Any code and profiling you can provide @peterjc is welcome to help us understand your issue better.

peterjc commented 7 months ago

The performance drop off I am seeing is from v0.6.1 and seems to be associated with the setGlobals change.

Test case adapted from something I wrote using primer3-py 0.6.1 in 2022.

The target sequence is a subset of the reference FASTA file, and I didn't want the primers to bind elsewhere in the genome (but this may not have actually worked - there was a lot more filtering downstream).

Example bacterial genome from https://www.ncbi.nlm.nih.gov/nuccore/NC_004547.2 is 5 million bp, with the target sequence a region within that genome. The intent here was to design primers targeting that region, but not amplifying anywhere else in that genome, and thus using PRIMER_MISPRIMING_LIBRARY to point at the genome.

import time

from primer3 import __version__  # pip install primer3-py
from primer3.bindings import designPrimers
try:
    from primer3.primerdesign import setGlobals
except ImportError:
    setGlobals = None

ref_fasta = "NC_004547.2.fa"

seq = """\
GATTGATTGAAGTCCAGGCACCGATTCTCAGCCGTATCGGCGATGGCACACAGGATAACTTGTCTGGTAC
AGAAAAAGCGGTGCAGGTTAAGGTAAAAGCCTTACCGGATGCCACATTTGAAGTGGTGCACTCACTGGCA
AAATGGAAACGCAAAACGCTGGGCGCGTATGATTTTAGCTTCGGTGAAGGCATTTATACTCACATGAAGG
CGCTGCGTCCGGACGAAGATCGCCTGAGCCCGATCCACTCGGTTTATGTCGATCAGTGGGATTGGGAGCG
CGTGATGGAGGATGGCGAGCGCAATGCTGAATATCTGAAATCGACGGTCACGCGTATTTATCAAGGCATT
AAAGCGACTGAGGCTGCGGTACATCAGGCGTTTGGCATTCAGCCTTTCCTGCCAGAGCAGATTCATTTTG
TGCATACCGAAACCTTGCTGAAGCGTTATCCCGATCTGGACGCCAAAGGGCGTGAGCGAGCTATAGCTAA
AGAGCTGGGTGCGGTCTTCCTGATTGGGATTGGCGGTAAGCTGTCCAGCGGACACTCTCACGATGTGCGT
GCACCGGATTATGATGACTGGACGACACCAGGCGAGCAGGAATTGGCGGGTTTGAACGGCGATATCGTCG
TCTGGAACCCAGTCCTGAACGATGCGTTTGAGATTTCATCCATGGGTATCCGCGTGGACGCAGAGGCGCT
AACACGCCAACTGGCACTGACGCAGGATGAGGAACGTCTGAAGCTTGAATGGCATCAGGCGCTGCTGCGC
""".replace("\n", "")

GLOBALS = {
        "PRIMER_OPT_SIZE": 20,
        "PRIMER_MIN_SIZE": 18,
        "PRIMER_MAX_SIZE": 25,
        "PRIMER_PRODUCT_SIZE_RANGE": [[200, 250], [250, 300], [150, 200], [100, 150], [70, 100]],
        "PRIMER_NUM_RETURN": 50,
        "PRIMER_MISPRIMING_LIBRARY": ref_fasta,
        "PRIMER_LIB_AMBIGUITY_CODES_CONSENSUS": 0,  # we do have NNNNN in the scaffolds
}

seq_args = {"SEQUENCE_ID": "XXX", "SEQUENCE_TEMPLATE": seq}

print("Using empty dicts for misprime_lib and mishyb_lib")
start = time.time()
if setGlobals is None:
    print(f"primer3 {__version__} - can't use setGlobals")
    try:
        candidates = designPrimers(seq_args, GLOBALS, {}, {})
    except OSError:
        print("ERROR")
        candidates = []
else:
    print(f"primer3 {__version__} - using setGlobals")
    setGlobals(GLOBALS, {}, {})
    candidates = designPrimers(seq_args)
taken = time.time() - start
print(f"primer3 {__version__} with empty dicts gave {len(candidates)} in {round(taken,2)} seconds")

print("Using None for misprime_lib and mishyb_lib")
start = time.time()
if setGlobals is None:
    print(f"primer3 {__version__} - can't use setGlobals")
    # Could leave out misprime_lib and mishyb_lib as default is None
    candidates = designPrimers(seq_args, GLOBALS, None, None)
else:
    print(f"primer3 {__version__} - using setGlobals")
    setGlobals(GLOBALS, None, None)
    candidates = designPrimers(seq_args)
taken = time.time() - start
print(f"primer3 {__version__} with None gave {len(candidates)} in {round(taken,2)} seconds")

Usage:

$ python --version
Python 3.10.12
$ for V in 0.6.1 1.0.0 1.1.0 1.2.2 2.0.2; do pip install --quiet primer3-py==$V; python example.py ; done
...

Pertinent highlights on macOS using empty dicts for the misprime_lib and mishyb_lib settings, a memory error which has been fixed, similar results in terms of counts, but still a big slow down:

Pertinent highlights on macOS using None for the misprime_lib and mishyb_lib settings, far fewer primers and an enormous slowdown:

peterjc commented 7 months ago

Same example on a Linux machine with Python 3.11 which was installed via conda:

And:

i.e. Much the same as the macOS Python 3.10 tests:

The empty dicts approach when it doesn't out-of-memory error seems to give broadly comparable numbers of results, but takes over x50 longer.

Using None for the misprime_lib and mishyb_lib settings gives radically different results, and takes far far longer still.

benpruitt commented 7 months ago

Thanks for digging into this in such detail @peterjc. It looks like there are two issues at play:

  1. Pre-v1.0.0 primer3-py did not support the PRIMER_MISPRIMING_LIBRARY argument (same with PRIMER_INTERNAL_MISHYB_LIBRARY). I don't think this was well documented, but you had to provide the mispriming library as a dictionary argument to setGlobals. So in your v0.6.1 profiling, primer3 isn't loading the FASTA file or performing mispriming checks against it. You can see that the run time of v1.0.0 is the same as v0.6.1 if you comment out the "PRIMER_MISPRIMING_LIBRARY": ref_fasta, line in your test script.
  2. There was a memory initialization bug related to passing None as the misprime_lib and/or mishyb_lib. This bug only had consequences if either PRIMER_INTERNAL_MISHYB_LIBRARY or PRIMER_MISPRIMING_LIBRARY was set as well (so this particular circumstance). @grinner fixed this bug in #135 (which should be merged and released soon, after some test coverage is added)
benpruitt commented 7 months ago

@peterjc looks like this was autoclosed when I merged #133. I'm reopening it so you can (hopefully) confirm that the fix rolled into v2.0.3 addresses the above performance issue.

peterjc commented 7 months ago

Excellent. I suspected the PRIMER_MISPRIMING_LIBRARY value might not be working, but as I was doing a bunch of filtering downstream so that didn't actually matter.

I'll retest with v2.0.3 shortly and confirm my full script is back to v0.6.1 levels of performance. Thank you!

benpruitt commented 7 months ago

Just to clarify one of my previous comments -- I would still expect your test script to take on the order of seconds w/ v2.0.3. The reason it takes < 1s with v0.6.1 is because pre-v1.0.0, primer3-py didn't support the PRIMER_MISPRIMING_LIBRARY parameter at all (so you would have to parse the FASTA file yourself and then construct a dictionary like {'NC_004547.2': '<sequence>'} and pass it into setGlobals as the misprime_lib kwarg. So on v0.6.1 there is no FASTA loading and no mispriming checks, whereas on v2.0.3 there is now both FASTA loading/mispriming checks AND no issues with supplying None as the value for misprime_lib and/or mishyb_lib.

peterjc commented 7 months ago

My initial plan is not to set PRIMER_MISPRIMING_LIBRARY (since I understand this did nothing on v0.6.1 which is my baseline), confirm no major difference with v0.6.1, and then see if v2.0.3 will give broadly the same results in broadly the same time.

peterjc commented 7 months ago

Confirming expected behaviour in a modified version of this test case not setting PRIMER_MISPRIMING_LIBRARY, here on macOS:

$ for V in 0.6.1 1.0.0 1.1.0 1.2.2 2.0.2 2.0.3 ; do pip install --quiet primer3-py==$V; python example2.py ; done
primer3 0.6.1 with setGlobals(GLOBALS, None, None), designPrimers(seq_args) gave 1107 in 0.1 seconds
primer3 1.0.0 with design_primers(seq_args, GLOBALS) gave 1607 in 0.1 seconds
primer3 1.1.0 with design_primers(seq_args, GLOBALS) gave 1607 in 0.1 seconds
primer3 1.2.2 with design_primers(seq_args, GLOBALS) gave 1607 in 0.06 seconds
primer3 2.0.2 with design_primers(seq_args, GLOBALS) gave 1161 in 0.06 seconds
primer3 2.0.3 with design_primers(seq_args, GLOBALS) gave 1161 in 0.05 seconds

It looks like this actually got slightly faster at some point too 👍

And example2.py is:

```python import time from primer3 import __version__ from primer3.bindings import designPrimers # pip install primer3-py==0.6.1 try: from primer3.primerdesign import setGlobals except ImportError: setGlobals = None try: from primer3.bindings import design_primers except ImportError: design_primers = None # ref_fasta = "NC_004547.2.fa" seq = """\ GATTGATTGAAGTCCAGGCACCGATTCTCAGCCGTATCGGCGATGGCACACAGGATAACTTGTCTGGTAC AGAAAAAGCGGTGCAGGTTAAGGTAAAAGCCTTACCGGATGCCACATTTGAAGTGGTGCACTCACTGGCA AAATGGAAACGCAAAACGCTGGGCGCGTATGATTTTAGCTTCGGTGAAGGCATTTATACTCACATGAAGG CGCTGCGTCCGGACGAAGATCGCCTGAGCCCGATCCACTCGGTTTATGTCGATCAGTGGGATTGGGAGCG CGTGATGGAGGATGGCGAGCGCAATGCTGAATATCTGAAATCGACGGTCACGCGTATTTATCAAGGCATT AAAGCGACTGAGGCTGCGGTACATCAGGCGTTTGGCATTCAGCCTTTCCTGCCAGAGCAGATTCATTTTG TGCATACCGAAACCTTGCTGAAGCGTTATCCCGATCTGGACGCCAAAGGGCGTGAGCGAGCTATAGCTAA AGAGCTGGGTGCGGTCTTCCTGATTGGGATTGGCGGTAAGCTGTCCAGCGGACACTCTCACGATGTGCGT GCACCGGATTATGATGACTGGACGACACCAGGCGAGCAGGAATTGGCGGGTTTGAACGGCGATATCGTCG TCTGGAACCCAGTCCTGAACGATGCGTTTGAGATTTCATCCATGGGTATCCGCGTGGACGCAGAGGCGCT AACACGCCAACTGGCACTGACGCAGGATGAGGAACGTCTGAAGCTTGAATGGCATCAGGCGCTGCTGCGC """.replace( "\n", "" ) GLOBALS = { "PRIMER_OPT_SIZE": 20, "PRIMER_MIN_SIZE": 18, "PRIMER_MAX_SIZE": 25, "PRIMER_PRODUCT_SIZE_RANGE": [ [200, 250], [250, 300], [150, 200], [100, 150], [70, 100], ], "PRIMER_NUM_RETURN": 50, # "PRIMER_MISPRIMING_LIBRARY": ref_fasta, "PRIMER_LIB_AMBIGUITY_CODES_CONSENSUS": 0, # we do have NNNNN in the scaffolds } seq_args = {"SEQUENCE_ID": "XXX", "SEQUENCE_TEMPLATE": seq} start = time.time() if setGlobals is None: # Could leave out misprime_lib and mishyb_lib as default is None candidates = design_primers(seq_args, GLOBALS) taken = time.time() - start print( f"primer3 {__version__} with " "design_primers(seq_args, GLOBALS) gave " f"{len(candidates)} in {round(taken,2)} seconds" ) else: setGlobals(GLOBALS, None, None) candidates = designPrimers(seq_args) taken = time.time() - start print( f"primer3 {__version__} with " "setGlobals(GLOBALS, None, None), " "designPrimers(seq_args) gave " f"{len(candidates)} in {round(taken,2)} seconds" ) ```

Making the equivalent change in my original primer design script should be a formality now - closing issue:

Thank you both!