primer3-org / primer3

Primer3 is a command line tool to select primers for polymerase chain reaction (PCR).
GNU General Public License v2.0
218 stars 62 forks source link

Nested structure for primer results #44

Closed peterjc closed 3 years ago

peterjc commented 3 years ago

Consider this example adapted from the one of your test cases:

>>> sequence_template = 'GCTTGCATGCCTGCAGGTCGACTCTAGAGGATCCCCCTACATTTTAGCATCAGTGAGTACAGCATGCTTACTGGAAGAGAGGGTCATGCAACAGATTAGGAGGTAAGTTTGCAAAGGCAGGCTAAGGAGGAGACGCACTGAATGCCATGGTAAGAACTCTGGACATAAAAATATTGGAAGTTGTTGAGCAAGTNAAAAAAATGTTTGGAAGTGTTACTTTAGCAATGGCAAGAATGATAGTATGGAATAGATTGGCAGAATGAAGGCAAAATGATTAGACATATTGCATTAAGGTAAAAAATGATAACTGAAGAATTATGTGCCACACTTATTAATAAGAAAGAATATGTGAACCTTGCAGATGTTTCCCTCTAGTAG'
>>> seq_args = { 'SEQUENCE_ID': 'MH1000','SEQUENCE_TEMPLATE': sequence_template,}
>>> global_args = {
            'PRIMER_OPT_SIZE': 20,
            'PRIMER_PICK_INTERNAL_OLIGO': 1,
            'PRIMER_INTERNAL_MAX_SELF_END': 8,
            'PRIMER_MIN_SIZE': 18,
            'PRIMER_MAX_SIZE': 25,
            'PRIMER_OPT_TM': 60.0,
            'PRIMER_MIN_TM': 57.0,
            'PRIMER_MAX_TM': 63.0,
            'PRIMER_MIN_GC': 20.0,
            'PRIMER_MAX_GC': 80.0,
            'PRIMER_MAX_POLY_X': 100,
            'PRIMER_INTERNAL_MAX_POLY_X': 100,
            'PRIMER_SALT_MONOVALENT': 50.0,
            'PRIMER_DNA_CONC': 50.0,
            'PRIMER_MAX_NS_ACCEPTED': 0,
            'PRIMER_MAX_SELF_ANY': 12,
            'PRIMER_MAX_SELF_END': 8,
            'PRIMER_PAIR_MAX_COMPL_ANY': 12,
            'PRIMER_PAIR_MAX_COMPL_END': 8,
            'PRIMER_PRODUCT_SIZE_RANGE': [[75,100],[100,125],[125,150],[150,175],[175,200],[200,225]],
        }
>>> binding_res = bindings.designPrimers(seq_args, global_args)
>>> type(binding_res)
<class 'dict'>
>>> print(repr(binding_res).replace(", '", ",\n'"))
{'PRIMER_LEFT_EXPLAIN': 'considered 2285, too many Ns 25, GC content failed 32, low tm 1366, high tm 189, ok 673',
'PRIMER_RIGHT_EXPLAIN': 'considered 2285, too many Ns 25, GC content failed 80, low tm 1484, high tm 126, high hairpin stability 5, ok 565',
'PRIMER_INTERNAL_EXPLAIN': 'considered 3367, too many Ns 27, GC content failed 92, low tm 2862, high tm 17, high hairpin stability 15, ok 354',
'PRIMER_PAIR_EXPLAIN': 'considered 671, unacceptable product size 659, no internal oligo 5, ok 7',
'PRIMER_LEFT_NUM_RETURNED': 5,
'PRIMER_RIGHT_NUM_RETURNED': 5,
'PRIMER_INTERNAL_NUM_RETURNED': 5,
'PRIMER_PAIR_NUM_RETURNED': 5,
'PRIMER_PAIR_0_PENALTY': 1.373239688566116,
'PRIMER_LEFT_0_PENALTY': 1.3299057711502655,
'PRIMER_RIGHT_0_PENALTY': 0.043333917415850465,
'PRIMER_INTERNAL_0_PENALTY': 6.224608874676505,
'PRIMER_LEFT_0_SEQUENCE': 'GCATCAGTGAGTACAGCATGC',
'PRIMER_RIGHT_0_SEQUENCE': 'TCTCCTCCTTAGCCTGCCTT',
'PRIMER_INTERNAL_0_SEQUENCE': 'ACTGGAAGAGAGGGTCATGCAACA',
'PRIMER_LEFT_0': (46, 21),
'PRIMER_RIGHT_0': (132, 20),
'PRIMER_INTERNAL_0': (69, 24),
'PRIMER_LEFT_0_TM': 59.670094228849734,
'PRIMER_RIGHT_0_TM': 59.95666608258415,
'PRIMER_INTERNAL_0_TM': 57.775391125323495,
'PRIMER_LEFT_0_GC_PERCENT': 52.38095238095238,
'PRIMER_RIGHT_0_GC_PERCENT': 55.0,
'PRIMER_INTERNAL_0_GC_PERCENT': 50.0,
'PRIMER_LEFT_0_SELF_ANY_TH': 10.513588697583486,
'PRIMER_RIGHT_0_SELF_ANY_TH': 0.0,
'PRIMER_INTERNAL_0_SELF_ANY_TH': 0.0,
'PRIMER_LEFT_0_SELF_END_TH': 10.513588697583486,
'PRIMER_RIGHT_0_SELF_END_TH': 0.0,
'PRIMER_INTERNAL_0_SELF_END_TH': 0.0,
'PRIMER_LEFT_0_HAIRPIN_TH': 42.52778282883122,
'PRIMER_RIGHT_0_HAIRPIN_TH': 0.0,
'PRIMER_INTERNAL_0_HAIRPIN_TH': 34.31335532251251,
'PRIMER_LEFT_0_END_STABILITY': 4.06,
'PRIMER_RIGHT_0_END_STABILITY': 4.35,
'PRIMER_PAIR_0_COMPL_ANY_TH': 0.0,
'PRIMER_PAIR_0_COMPL_END_TH': 0.0,
'PRIMER_PAIR_0_PRODUCT_SIZE': 87,
'PRIMER_PAIR_1_PENALTY': 1.5090296435631672,
'PRIMER_LEFT_1_PENALTY': 1.3299057711502655,
'PRIMER_RIGHT_1_PENALTY': 0.17912387241290162,
'PRIMER_INTERNAL_1_PENALTY': 6.224608874676505,
'PRIMER_LEFT_1_SEQUENCE': 'GCATCAGTGAGTACAGCATGC',
'PRIMER_RIGHT_1_SEQUENCE': 'CAGTGCGTCTCCTCCTTAGC',
'PRIMER_INTERNAL_1_SEQUENCE': 'ACTGGAAGAGAGGGTCATGCAACA',
'PRIMER_LEFT_1': (46, 21),
'PRIMER_RIGHT_1': (139, 20),
...
'PRIMER_PAIR_4_COMPL_ANY_TH': 0.0,
'PRIMER_PAIR_4_COMPL_END_TH': 0.0,
'PRIMER_PAIR_4_PRODUCT_SIZE': 84}

This is a single flat dictionary, but there is obvious nested structure here with the five primers sets 0 to 4, could we not have a (optional) nested dict?:

These make sense as top level entries:

'PRIMER_LEFT_EXPLAIN': 'considered 2285, too many Ns 25, GC content failed 32, low tm 1366, high tm 189, ok 673',
'PRIMER_RIGHT_EXPLAIN': 'considered 2285, too many Ns 25, GC content failed 80, low tm 1484, high tm 126, high hairpin stability 5, ok 565',
'PRIMER_INTERNAL_EXPLAIN': 'considered 3367, too many Ns 27, GC content failed 92, low tm 2862, high tm 17, high hairpin stability 15, ok 354',
'PRIMER_PAIR_EXPLAIN': 'considered 671, unacceptable product size 659, no internal oligo 5, ok 7',

These would be redundant under my idea:

'PRIMER_LEFT_NUM_RETURNED': 5,
'PRIMER_RIGHT_NUM_RETURNED': 5,
'PRIMER_INTERNAL_NUM_RETURNED': 5,
'PRIMER_PAIR_NUM_RETURNED': 5,

All the rest have an index and would be better a list of dicts or named tuples:

'PRIMER_PAIR': [5 entry list],
'PRIMER_LEFT': [5 entry list],.
'PRIMER_RIGHT': [5 entry list],
'PRIMER_INTERNAL': [5 entry list],

Here the PRIMER_PAIR entry could be:

[{'PENALTY': 1.373239688566116, 'COMPL_ANY_TH': 0.0, 'COMPL_END_TH': 0.0, 'PRODUCT_SIZE': 87}, ...]

And the PRIMER_LEFT entry could be:

[{PENALTY': 1.3299057711502655, 'SEQUENCE': 'GCATCAGTGAGTACAGCATGC', 'COORDS': (46, 21), 'TM': 59.670094228849734, 'GC_PERCENT': 52.38095238095238, 'SELF_ANY_TH': 10.513588697583486, 'SELF_END_TH': 10.513588697583486, 'HAIRPIN_TH': 42.52778282883122, 'END_STABILITY': 4.06}, ...]

(You'd need a key for 'PRIMER_LEFT_0': (46, 21), though - maybe COORDS?)

etc.

This could be requested by a keyword argument to preserve backward compatibility?

peterjc commented 3 years ago

Apologies, wrong repository - mean to use https://github.com/libnano/primer3-py/