martinfthomsen / rucs2

2 stars 0 forks source link

Make reported positions relative to reference #13

Closed martinfthomsen closed 1 year ago

martinfthomsen commented 1 year ago

From an anonymous user: maybe it is a feature, but I would think it is a bug. Well, at least it is behaviour that I do not expect. I provide 5000 bp of sequence as my “positive” sequence to rucs and I am running “full" through docker. (negative is the rest of an eukaryotic genome) The output file I am mostly interested in is “results_best.tsv”, which, so far so good, contains good primer candidates. I like the feature that the table contains the positions in the sequence where forward and reverse primer match. Problem is that these positions are not correct. Well, it seems that they are referring to positions of the oligos in the “pcr_products_with_skirts.fa” file. Trouble is, that this does not correspond to the position in my original input file. It seems that “pcr_products_with_skirts.fa” misses leading and trailing sequences, probably “nnnnn..” and so the positions no longer match. Please check the example below “0_0" is the “pcr_products_with_skirts.fa" as returned by rucs. Positions are shifted by 280 bp when compared to my “original sequence”. Note also that there are about 20 bp at the end missing as well.

(I am using rucs to generate oligos for genotyping variants in highly repetitive regions. My variant is in the middle at exactly 2500 bp. if now the positions of the oligos are not correct, I am having a hard time picking flanking primers.) Correcting this for a handful is not a big deal, but I am scaling up to the order of dozens and I would prefer to have a proper fix

0_0 
CTCTCTGTCTCCAATGTGAATTTATCTCAGGCCTTCCCAGGTTGATATTGAnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnGCTTGAATTTTGCAAAATGGATGAGTTTAATTCAATnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnAGTGAAATCCATGTGGTGAAAATGGTAAGnnnnGTGTGGTAAAAAAGCAAAATAAAAAGGATACACGGACGACTTGGTCCGAAACTAGCACAACTACCnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnTCGACCGATAGGGCTGTCAATACCCTAnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnGAAGAACGAACCAGTTGATACAAAAGTACAATGTTGAnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnAACTAACTAGAACAAGAACAAGGGGAACTAGAACTAGAAnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnTCCGCGAAGTTGAAGTTGAATGGTTGCTGTAGCTGGGCGnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnCCCCATCGGCCTCCCGTTCGTTCnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnCGTCAACAGCAGGCGGGCATGGCGATCGAGGACCnnnnnnnnnnnnnnnnnnACCTGGCGCGCTCCTACAATGGGATCGAGAGGGCCAACAnnnACCTCGTTGGTGAGAACACCGCTCnnnnnnnnAAGATTCGCGGTACGTTTCCACTATAATGTTCTCTTCCAAGGGCATGACTTGTGTTnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnGTCGAGGCCCGCTCCACCACGGCTCAGCTCGAGGGGGAGnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnCCTCTTGGGAGCTCGAACCAATGGACGAGTCGAGGGnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnTGATATGGAGGTGGTGTCTGTGGCAGAACCTCnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnCATCTCAGGATGCTCATTCGACCGCCACCTGTGGCGTATCAAACAnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnCCTTAACAACAGGATCACTCGTTGCGAAnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnTGGGTCACGCTCACACTTCCTCGTCGTCAACCTCnnCGACGATCACACAACAGGGCCCGAAnnnnnnnnnnnnnnnnnnTTCACCTGCAACAGGGGTAATAAAACCCTAAGTACGGGnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnGACAAGGAGTGGCTGAGCATGGAGCCAAGTTGTTTTGCTnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnTTGGAATCCGAACCTCATGCGACATTTCCCCTTTCTCATTTTACCCCTTAGTTGCATATTTAACnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnGAATCGCTTAGCAATCCAGGTGGnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnACCCTTCAAACTTTGTTGAAAACCTATCTnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnTTGCCTTGGTTGCAGAGATAAGAGGGACCCTCGCAGACTTCnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnATACGCAAAATGATGGATGATCATGATATGCGTATGACAnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnCAAAACAGGCTAAACAGTAGTTAAAGTCGTTTATTGTTT

original sequence
GGACTAAACATGTTGAATCGATTGGGTTGGTTAATCTAGAGTTGTAAATCATATATGTTTGTCTCTTTTATTCATGTGAACGCTAGAACCAAGGAGAAACTTATAAAAGTGATAGTCAAGTACCTTAAGATGTTACCTTACTTGAAGAGTGATGGTACAGTATCACCTTGTGGAAGCTTTCCTTTCTTAGTGGTTGTGCATCGTGTTTATGGCACCCTCCATGGATCATCTCTTGTGAGCTATGCCTTCCTTATGTAAATTGTTAAACTTCATGTGTCTCTCTCTGTCTCCAATGTGAATTTATCTCAGGCCTTCCCAGGTTGATATTGAAAGTTTTCCTGCAGGGGTTAGTCCAACAAAATATCCAATATCTACTATAGCATACTATCGGCTCAAAAATCAACCTAGACACCATGTCTAATTTGATTCAGGAGGGCATTTTAACCTAAGCATCATGCTTAATTTAAAATCCTTTGGAAGTAAGATCATCATTCCTAAACTAAGCACCATGCTTAATTAAGAGATAAATGAAACTACTCCAAACCTGGACATATACTAAAGCAAATAAAGGTGGTATGAGAGCATTTATAGAGATCTACTCAAGTGGAGATGAAGTAGTGTGATTAGTATTTCACAAATTGAGCATGTCTCAAAGGTAGGGTCAACCAACATAGCAACATGGCAAAGGGTGTTTTTATGTAAAGTACTCCCCCAAGCTTGAATTTTGCAAAATGGATGAGTTTAATTCAATGTTGTATGATGATTGGTTGAACATACCTTGTGCTTGTCATTCATCTGATCTTCTTGCTCCAATCCTGGATAGGTTAGTGACAAGAATACCCGAAGGAATATTTTTACAATTGTCCTTATATGCTCAACACACAAGGTAATGTTGCAACTAATTAAAAGCTCATGTTACAATCTGATCAGTGCTTGTTTTAGGACACTAAGCTTGTCCTTGGGAAACCATTAATTTATGTCAGCGAAGTGGTTTCCCCTCCAATGCTGACCTATATCAAGATCAACTCAACGAGATCTACAATATTTAATATAGACATATGCATCGACAACCGCCCTTTTAAAGTTTTTATAAATAGAAAGGAAGTGGGGGTTGGAGATCTCGAATGTGGTGATGTTGGAGAGACCAATGGATTGTGAAGATGTTGCATATGAGCATGGAGTAAATGAAGTGGCTATGAAATAAATTCCATGCTCATTAATGCTTAGAGTGAAATCCATGTGGTGAAAATGGTAAGGTGAGTGTGGTAAAAAAGCAAAATAAAAAGGATACACGGACGACTTGGTCCGAAACTAGCACAACTACCGTTCCCTAGCAACGGCGCCAGAAAGCTTGTTGGGTATTTTAAACGCAAACAGAAAAATCCGCAAGCGCACGGATACCAATGTAGCCTTCACCCGGGAGTATTCCAGAGTATCGATTTTCCACAGGGAACGTGAGTGTACTAATTAAGATCCAAGATCGCCCAATGATAACTATATGATTATTTTTGGTGGGAAGAGAGGAAGTTTCCTGAGAGTTCTCTAGTTGATTTAAGAATCAAGAATTACTTCAAGATTATCTATTTCGGGACATCAGAACACTAACCACAGAAAGAGATGAGAAGGGGGCTAGGAAGCTCTGACTACGGTCCTACAAACACCGATCCGAACGAGGTGGAATACGTCGACCGATAGGGCTGTCAATACCCTAGGGCTACCACAACAATCCGTAGGATTGGGTGCAATTCCAGGTAATTGCAAGACTAAACACCACGTCTAATCTATTAATTACTACTCTAATGTTTCAAAGATTAGAGCACTTGATGCAAGCGGGAATCCAATAAATAACTTGAATATAAATAAAAGTAATTAAAGAACTCAGGATTATGAATTGAAGAACCTGGAGAACGATGAAGAACGAACCAGTTGATACAAAAGTACAATGTTGAGGAAGATCTGACAGATCCGGCTCCTCCTCCGCTCTCCTCTTCTCTCTCCCTATTTTCTAGATTACAACTAGAACTAACTAGAACAAGAACAAGGGGAACTAGAACTAGAACTAGATGAACTAGAACTAGATCTAGATGAACTAGAGGTAGAACTAGAAGAACTAGAGGGATCCTCTCTACTTGGATGAAGAATTGAAACCCTAACTTTGATTCTGTAGAAGAGGTATGATCTCCAGGGGCCAGGGGGTCTAGTTTTATAGTCCCTTCAAGTGAATCTGGGCCGTTGGATCAAACTGACATTAATCGCACGGTTTTCCTTGGTCCCTTAGGTCGGTGGAACACGATCCGCGAAGTTGAAGTTGAATGGTTGCTGTAGCTGGGCGGGCACCCAAGGGGGGAGGGCGGGCGCCCTGCCCCCGGGCCCCATCGGCCTCCCGTTCGTTCCCGTCGCTTCTGGAGTCTTCTAGATGGTAGAAAATTGCGCGGCATGTTAATATCTCTATGTAAACCCGACGTGTGGGCCTTTCTTCGGTATTTCCTGATAACCCCCTACAGAAATAGACAAACACCAAAACTTGTGGAATTCTATCAGATAAAACCCTAAGTCTAGGTGTTGGTTGCATTTGGATCCTTTTCCATGATTAGTTGATGGTTAAATATGAGCATTAAGGACCGTCAACAGCAGGCGGGCATGGCGATCGAGGACCTGCGCGTCACCAACACCGACCTGGCGCGCTCCTACAATGGGATCGAGAGGGCCAACACCGACCTCGTTGGTGAGAACACCGCTCTAGAAGAGAAGATTCGCGGTACGTTTCCACTATAATGTTCTCTTCCAAGGGCATGACTTGTGTTATCTAACTTCTCCGTGTCGGTCCTTGTAGGGCTTAAAGATGAGCTGCTTGCTGCTCAAGTCGAGGCCCGCTCCACCACGGCTCAGCTCGAGGGGGAGGTCGCTCTGAACCGTCGGCTGCGGACGGCGATAAGCGATCTCTCGGCCTCTTGGGAGCTCGAACCAATGGACGAGTCGAGGGAGGAGGCTCGAGGCGACGCACTGGTCGACCAGCTGTGCTATCTAGGTGCCACGCTGAGGGATCGAGTGCGGGACGCTCTTCATATCGGAGTGAAGCGGGCAATGGCGGTAGTCTGCTCAGGCTTTTCTTATGATATGGAGGTGGTGTCTGTGGCAGAACCTCCTAAGTGTTTGGGCCCACCGACACCTGTCATTGTCCTAAGGACCTCTAACGGTGATGTCGGGGATCCTCCCACTCTAGAAGTCCGATGAGCATCTCAGGATGCTCATTCGACCGCCACCTGTGGCGTATCAAACAAGCTCGTCTTGACCACTAGCAATGGTCAATTAACGAAAACTTCTTCTCCTCGCCCTTACAGGATAGCAAGTGGCCACCTTAACAACAGGATCACTCGTTGCGAAGACATTGCAGTATCACAAATGACATAAGACCAAAATAATATTTCAGAGTAAAACAGTGGAAGCATTACATGACTTAATTTACAAAAGTAGTTCTTCCAAATAGTAGAGTGTTATTACAAGCCAGGTCCAATGGAGCAACTTAAACTTTAGTACACAGGTTACAAATTTATAGACCCTGCCCATGGGTCACGCTCACACTTCCTCGTCGTCAACCTCGACGACGATCACACAACAGGGCCCGAAACATGTCGGCTCCTGAGCTTCACCTGCAACAGGGGTAATAAAACCCTAAGTACGGGAGTACTCAACAAGACTTACCCGGCGGAAAAAGAGGAGAAATTAGGGATGCAGGCTTGGGGTAGACAAGGAGTGGCTGAGCATGGAGCCAAGTTGTTTTGCTAAAAAGCTTACTAATAGTGCATCCTTACTTTCAAGTTTTACCCGCAAATTCCTCTCCTTAAGAGGTCGAGGAGTTCGAGGACAGTCTATCTAGTTGCTTCTCACAGACAAGTCTGTGAGCTCCTAGTCCTTAATCAAAGTATTTTCTATCCAACGGCCATAGCTTCATGTCACTCTGAGTCCGGGAATTGGAATCCGAACCTCATGCGACATTTCCCCTTTCTCATTTTACCCCTTAGTTGCATATTTAACTACGATGAGGGTCGAAGGAATTGAGTCTCCCAATCGGGGAGCAACGACGATTCGAATCGCTTAGCAATCCAGGTGGAGTTCCTAACCACCCGACATATGTAGAACCAAATCTTGCATATGTCAACCCAAATCGAGCTCTCCCCAAATTTGACTAGGTTCGCGGCACCCGAGAGCACAGTACTCCACCATCCTACAGCCGATCTAGATGTTTTCCGGTCATCTCAGATCTGTAAGGTGGGTACACGCTACTCTCGTCATCGCTCCACGCCCAGTGTGCGAGTAGCTGTTCGCGTCGAGGGATTACAAGATCGGGCTTACCGAGGGCAAGTGGCTAGTACTATAAATTCTCAACCCAGGAGGCCATCAACGGACCGGTCCTTAATCGACACAGTTGGAGACACTACTTTAAGACTCCATTCTTAAATCAAGTCCACCGACCGGTCTCAAATTGAAACATTATTGATCATAAGTGTATTCCACAACAACCCTTCAAACTTTGTTGAAAACCTATCTAGTAGCAGGGCTAAGCATCACTACGCAATTTTAAAATAAAACAGGTGCCAAGGACAAGATAACAAATATCAAGGTAGTAAATGCAGCAAGTAGGTTAACCCAATTCTCAACTACCTAATGCAGCATTTTCAATTCATAAGTGATAAAAGATTTCAAACATTCAAGGAGGGGTTAATGCATCCGGGGCTTGCCTTGGTTGCAGAGATAAGAGGGACCCTCGCAGACTTCCCAAGTCTCGGATCCTACTTCTTCGAACGGAGCACCGACCTCGGGGTTCGGTTCAATGGTCGCTCCTTCGGTCACGTGCACTATACGCAAAATGATGGATGATCATGATATGCGTATGACAAAATTTAAGAAGGACCGAGTTGAGTACAACTCTCCTTCTCGGTGCAAAACAGGCTAAACAGTAGTTAAAGTCGTTTATTGTTTTAGGGTCTTTACCCAAGAGG
martinfthomsen commented 1 year ago

Dear anonymous user,

Thank you for your feedback. It is not a bug. And there is a solution to your scale up wish.

The primer position provided comes from the underlying Primer3 software which generates the primer candidates. The position relates to the input sequence which was used as input, and the input sequence comes from the dissected scaffolds of the computed unique core sequences.

On the left side in the file “results_best.tsv”, you will find the sequence ID of the sequence from which the primer pair was created. This ID consist of useful numbers to help you identify the location of the primer position in the original sequence. The first number is the index of the sequence in the reference file. In your case you only have one sequence in the reference, thus the index is 0. The second number is the position of the dissected scaffold in the original sequence.

Thus by identifying the original sequence and adding the position of the dissected scaffold to the position of the primer with in the scaffold you can find your primer mapping point in the original sequence.

Here is an example Below is the first two lines in a results_best.tsv file:

#sequence_id    product_size    unique_flags    sensitivity specificity noise   penalty p3_penalty  forward_primer  forward_tm  forward_length  forward_gc% forward_position    reverse_primer  reverse_tm  reverse_length  reverse_gc% reverse_position    probe   probe_tm    probe_length    probe_gc%   probe_position  annotation
0_1375921_0 635 3   1.0 1.0 0.0 0   2.42    CACCCAGTAGAGCACACTTTG   58.9    21  52.4    2525    GCGATTAGCTTCTCTTGCAGT   58.7    21  47.6    3159                        50S ribosomal protein L6; 50S ribosomal protein L18; 30S ribosomal protein S5

Below is some python code running inside RUCS to show how to find the forward_primer location in the my example:

>>> my_seqs = [seq for seq, name, desc in seqs_from_file('CP000672.1_GCA_000016485.1_ASM1648v1_genomic.fna.gz')]
>>> sequence_id = "0_1375921_0"
>>> forward_primer = "CACCCAGTAGAGCACACTTTG"
>>> forward_position = "2525"
>>> sid = int(sequence_id.split('_')[0])
>>> spos = int(sequence_id.split('_')[1])
>>> ppos = int(forward_position)
>>> length = len(forward_primer)
>>> my_seqs[sid][spos+ppos:spos+ppos+length] == forward_primer
True

I have added this example to the readme file. To hopefully help others with similar thoughts/wishes/use-cases.

Kind regards, Martin