libnano / primer3-py

Simple oligo analysis and primer design
https://libnano.github.io/primer3-py
GNU General Public License v2.0
168 stars 44 forks source link

Nested structure for primer results #50

Closed peterjc closed 1 year ago

peterjc commented 3 years ago

Consider this example adapted from the one of your test cases:

>>> from primer3 import bindings
>>> sequence_template = 'GCTTGCATGCCTGCAGGTCGACTCTAGAGGATCCCCCTACATTTTAGCATCAGTGAGTACAGCATGCTTACTGGAAGAGAGGGTCATGCAACAGATTAGGAGGTAAGTTTGCAAAGGCAGGCTAAGGAGGAGACGCACTGAATGCCATGGTAAGAACTCTGGACATAAAAATATTGGAAGTTGTTGAGCAAGTNAAAAAAATGTTTGGAAGTGTTACTTTAGCAATGGCAAGAATGATAGTATGGAATAGATTGGCAGAATGAAGGCAAAATGATTAGACATATTGCATTAAGGTAAAAAATGATAACTGAAGAATTATGTGCCACACTTATTAATAAGAAAGAATATGTGAACCTTGCAGATGTTTCCCTCTAGTAG'
>>> seq_args = { 'SEQUENCE_ID': 'MH1000','SEQUENCE_TEMPLATE': sequence_template,}
>>> global_args = {
            'PRIMER_OPT_SIZE': 20,
            'PRIMER_PICK_INTERNAL_OLIGO': 1,
            'PRIMER_INTERNAL_MAX_SELF_END': 8,
            'PRIMER_MIN_SIZE': 18,
            'PRIMER_MAX_SIZE': 25,
            'PRIMER_OPT_TM': 60.0,
            'PRIMER_MIN_TM': 57.0,
            'PRIMER_MAX_TM': 63.0,
            'PRIMER_MIN_GC': 20.0,
            'PRIMER_MAX_GC': 80.0,
            'PRIMER_MAX_POLY_X': 100,
            'PRIMER_INTERNAL_MAX_POLY_X': 100,
            'PRIMER_SALT_MONOVALENT': 50.0,
            'PRIMER_DNA_CONC': 50.0,
            'PRIMER_MAX_NS_ACCEPTED': 0,
            'PRIMER_MAX_SELF_ANY': 12,
            'PRIMER_MAX_SELF_END': 8,
            'PRIMER_PAIR_MAX_COMPL_ANY': 12,
            'PRIMER_PAIR_MAX_COMPL_END': 8,
            'PRIMER_PRODUCT_SIZE_RANGE': [[75,100],[100,125],[125,150],[150,175],[175,200],[200,225]],
        }
>>> binding_res = bindings.designPrimers(seq_args, global_args)
>>> type(binding_res)
<class 'dict'>
>>> print(repr(binding_res).replace(", '", ",\n'"))
{'PRIMER_LEFT_EXPLAIN': 'considered 2285, too many Ns 25, GC content failed 32, low tm 1366, high tm 189, ok 673',
'PRIMER_RIGHT_EXPLAIN': 'considered 2285, too many Ns 25, GC content failed 80, low tm 1484, high tm 126, high hairpin stability 5, ok 565',
'PRIMER_INTERNAL_EXPLAIN': 'considered 3367, too many Ns 27, GC content failed 92, low tm 2862, high tm 17, high hairpin stability 15, ok 354',
'PRIMER_PAIR_EXPLAIN': 'considered 671, unacceptable product size 659, no internal oligo 5, ok 7',
'PRIMER_LEFT_NUM_RETURNED': 5,
'PRIMER_RIGHT_NUM_RETURNED': 5,
'PRIMER_INTERNAL_NUM_RETURNED': 5,
'PRIMER_PAIR_NUM_RETURNED': 5,
'PRIMER_PAIR_0_PENALTY': 1.373239688566116,
'PRIMER_LEFT_0_PENALTY': 1.3299057711502655,
'PRIMER_RIGHT_0_PENALTY': 0.043333917415850465,
'PRIMER_INTERNAL_0_PENALTY': 6.224608874676505,
'PRIMER_LEFT_0_SEQUENCE': 'GCATCAGTGAGTACAGCATGC',
'PRIMER_RIGHT_0_SEQUENCE': 'TCTCCTCCTTAGCCTGCCTT',
'PRIMER_INTERNAL_0_SEQUENCE': 'ACTGGAAGAGAGGGTCATGCAACA',
'PRIMER_LEFT_0': (46, 21),
'PRIMER_RIGHT_0': (132, 20),
'PRIMER_INTERNAL_0': (69, 24),
'PRIMER_LEFT_0_TM': 59.670094228849734,
'PRIMER_RIGHT_0_TM': 59.95666608258415,
'PRIMER_INTERNAL_0_TM': 57.775391125323495,
'PRIMER_LEFT_0_GC_PERCENT': 52.38095238095238,
'PRIMER_RIGHT_0_GC_PERCENT': 55.0,
'PRIMER_INTERNAL_0_GC_PERCENT': 50.0,
'PRIMER_LEFT_0_SELF_ANY_TH': 10.513588697583486,
'PRIMER_RIGHT_0_SELF_ANY_TH': 0.0,
'PRIMER_INTERNAL_0_SELF_ANY_TH': 0.0,
'PRIMER_LEFT_0_SELF_END_TH': 10.513588697583486,
'PRIMER_RIGHT_0_SELF_END_TH': 0.0,
'PRIMER_INTERNAL_0_SELF_END_TH': 0.0,
'PRIMER_LEFT_0_HAIRPIN_TH': 42.52778282883122,
'PRIMER_RIGHT_0_HAIRPIN_TH': 0.0,
'PRIMER_INTERNAL_0_HAIRPIN_TH': 34.31335532251251,
'PRIMER_LEFT_0_END_STABILITY': 4.06,
'PRIMER_RIGHT_0_END_STABILITY': 4.35,
'PRIMER_PAIR_0_COMPL_ANY_TH': 0.0,
'PRIMER_PAIR_0_COMPL_END_TH': 0.0,
'PRIMER_PAIR_0_PRODUCT_SIZE': 87,
'PRIMER_PAIR_1_PENALTY': 1.5090296435631672,
'PRIMER_LEFT_1_PENALTY': 1.3299057711502655,
'PRIMER_RIGHT_1_PENALTY': 0.17912387241290162,
'PRIMER_INTERNAL_1_PENALTY': 6.224608874676505,
'PRIMER_LEFT_1_SEQUENCE': 'GCATCAGTGAGTACAGCATGC',
'PRIMER_RIGHT_1_SEQUENCE': 'CAGTGCGTCTCCTCCTTAGC',
'PRIMER_INTERNAL_1_SEQUENCE': 'ACTGGAAGAGAGGGTCATGCAACA',
'PRIMER_LEFT_1': (46, 21),
'PRIMER_RIGHT_1': (139, 20),
...
'PRIMER_PAIR_4_COMPL_ANY_TH': 0.0,
'PRIMER_PAIR_4_COMPL_END_TH': 0.0,
'PRIMER_PAIR_4_PRODUCT_SIZE': 84}

This is a single flat dictionary, but there is obvious nested structure here with the five primers sets 0 to 4, could we not have a (optional) nested dict?:

These make sense as top level entries:

'PRIMER_LEFT_EXPLAIN': 'considered 2285, too many Ns 25, GC content failed 32, low tm 1366, high tm 189, ok 673',
'PRIMER_RIGHT_EXPLAIN': 'considered 2285, too many Ns 25, GC content failed 80, low tm 1484, high tm 126, high hairpin stability 5, ok 565',
'PRIMER_INTERNAL_EXPLAIN': 'considered 3367, too many Ns 27, GC content failed 92, low tm 2862, high tm 17, high hairpin stability 15, ok 354',
'PRIMER_PAIR_EXPLAIN': 'considered 671, unacceptable product size 659, no internal oligo 5, ok 7',

These would be redundant under my idea:

'PRIMER_LEFT_NUM_RETURNED': 5,
'PRIMER_RIGHT_NUM_RETURNED': 5,
'PRIMER_INTERNAL_NUM_RETURNED': 5,
'PRIMER_PAIR_NUM_RETURNED': 5,

All the rest have an index and would be better a list of dicts or named tuples:

'PRIMER_PAIR': [5 entry list],
'PRIMER_LEFT': [5 entry list],.
'PRIMER_RIGHT': [5 entry list],
'PRIMER_INTERNAL': [5 entry list],

Here the PRIMER_PAIR entry could be:

[{'PENALTY': 1.373239688566116, 'COMPL_ANY_TH': 0.0, 'COMPL_END_TH': 0.0, 'PRODUCT_SIZE': 87}, ...]

And the PRIMER_LEFT entry could be:

[{PENALTY': 1.3299057711502655, 'SEQUENCE': 'GCATCAGTGAGTACAGCATGC', 'COORDS': (46, 21), 'TM': 59.670094228849734, 'GC_PERCENT': 52.38095238095238, 'SELF_ANY_TH': 10.513588697583486, 'SELF_END_TH': 10.513588697583486, 'HAIRPIN_TH': 42.52778282883122, 'END_STABILITY': 4.06}, ...]

(You'd need a key for 'PRIMER_LEFT_0': (46, 21), though - maybe COORDS?)

etc.

This could be requested by a keyword argument to preserve backward compatibility?

peterjc commented 3 years ago

Something like this is what I have in mind, which records 'PRIMER_LEFT_0': (46, 21) etc 'START': 46 since the length seemed redundant.

def nest_primers(results):
    """Returns lists of dicts for LEFT, RIGHT, INTERNAL and PAIR entries."""
    left = []
    right = []
    internal = []
    pair = []
    for target, name in ((left, "LEFT"), (right, "RIGHT"), (internal, "INTERNAL"), (pair, "PAIR")):
        for i in range(results[f"PRIMER_{name}_NUM_RETURNED"]):
            prefix = f"PRIMER_{name}_{i}_"
            primer = {k[len(prefix):]:v for k, v in results.items() if k.startswith(prefix)}
            try:
                start, length = results[f"PRIMER_{name}_{i}"]
                primer["START"] = start
                assert length == len(primer["SEQUENCE"])
            except KeyError:
                # Does not apply to PAIR
                pass
            target.append(primer)
    return left, right, internal, pair
peterjc commented 3 years ago

If we can assume the left, right, pair (and if requested internal oligo) results are always the same length, then there are other options which might be even easier to use?

kevin2022lee commented 1 year ago

make sense

grinner commented 1 year ago

fixed in PR #110

peterjc commented 1 year ago

That sounds good, I'll try to remember to try this next time I'm designing primers. Thank you.

kevin2022lee commented 1 year ago

sound great!