Closed nh13 closed 1 year ago
@nh13 Some questions about this PR...
1) It doesn't seem like this change is necessary? Do you have an example? I see predictions from Infernal being shown on both strands in the webserver version. For example, I am using Addgene #85494 as example input file for testing.
2) I get a crash when I try to run Addgene #193140 with your code.
Please advise!
My apologies, I should have included the test case from the start
command
plannotate batch -i test.fasta -y debug.yaml -o tmp/ -f test -s '' -c -d
test.fasta
>SRR11171709.78.1 78 length=7306
CATAGAGAGATAGTATTGTAACCTGACAAAGGTTAAATTAGAAAAGAGCTTTATGAAAATAATCTCCGAGATAAAAACAAACCCGCTAGAAACATATTACGATGACATGCTTGTTTCATCAGGTGAAAAATTCCATTTTGTGGTGATATAGGACAGACAAAAAATATAGATCATTTTTCTTCCTATTGCACATTATCCTGATTTTCGGTGATGCCTATTATTTAGTTCCATCGTGCCGCGACTGCAATATGGGAGAGAAAGGTCAAGTTTTCGCAGTAGATGAGGTAACACCAGCGATTCATCCCTATATCGACAAGGACATTTTTTTTCGTGAGCAATGGGTATATGCAAATTTCGTTTCCGGAACTCCGGGTGCTATCAGTTTTTATGTTGAATGCCCGGCGAACTGGAGGCAGGAAGACAAACACAGAGCTCTTCATCATTTCAAGCTATTAAATATTGCTAACAGGTATCGTTTGGAGGCAGGGAAGCACTTGAGTGAAGTGATTACTCAAAGAAACTCTTTCGTAAAAGTTATAAGGAAATATAGTTCAACCGCAACGTTTCAGCAGCTACAGTCAGAATTTATTGAAGCAAATCTGAAACCTATTATAGATTTGAATGACTTCCCAATTATTGGAAAAGAGTTATGTATCAGTGCCTAGCAAACTCGGAAGATTTTTTCAGAGGGATCTAGAATATGATGAAAGATAGAAAATTACGACGCTTATCGGAAGTGAACGAATACTTTTTATATGAGGAGGGCTGTTTTTACAAAATCCGGTAGTAAACTTGCTAACCAATTCCTAGGCAGGTCATTGGGCCAACAGTGGCATGCACCGAGAAGGACGTTTGTAATGTCCGCTCCGGCACATAGCAGTCCTAGGGACAGTGGCGTACAGTCATAGATGGTCGGTGGGAGGTGGTACAAATTCTCTCATGCAAAAAATATGTAAAAATCGGTAGGCAAACTGGAAAATCATGCAACACCCGCACTATCGGAAGTTCACCAGCCAGCCCGCAGCACGTTCCTGCATACGACGTGGTCTGCGGCTCTACCATATCTCCTATGAGCAACGTGTTAGCAGAGCCAAGCCACAACTCTAATTTTAATACATAATGAATGATAATAATAATATTAAAAATTTCCTGTGTAACTAATTTACTATATGGTTTCTGATAAGAATCATTGCAAAGATCAAACAACTTGTATTACATTGACAGTTAAGCAGTTAATTTTATCACCTCTAAAAATATATCAGCATCTAGCAATGCAACCTATCAAAATGGAGAGTTTTATGACTAAAAAAACCATGGGAAAGAAGACTTAAAGATTTATTCGCACTTGCTCAAAATGCTGCATTGATACATATTTTGACCCTGAATTATTTCGCTTGAATTTGAATCAATTCCTCCAAACCGCAAGAACAGTAACATTTTATTATTCAAAAAAACAAAAACCAGATTATAGATATGACATTTGGTATAACAATAATGTTATTGAAAAAATGGAAAAATGATCCATTAATGGCTTGGGCTAAAAATTTCTCGCAATACGAGTAGAAAAACAAGGCGATTTAGAAATGTATAGCGAGGCAAAGGCTACTCTTATTTCATCTTACATTGAAGAAAATGACATTGAGTTTATTACAAATGAAAGTATGTTAAACATTGGTATAAAAAAGTTAGTCAGACTTGCACAAAAGAAATTACCTTCATATTTAACTGAATCATCTCTTATTAATCAGAAAGACGATGGGTCGCTAATACGCTAAAAGATTACGAATTATTACATGCCTTACTATAATCTATGGCAGAATGTATAACTGCTGTAACTCTCTTGGCATACAAATACAATCCAATGGGTGACGATGTGATTTCGCCAACATCATTCGACTCTTTATTTGATGAAGCCAGGAGAATAACTTATTTTAAAATTAAAAGATTACTCCATAAGCAAATTGTCATTTAGCATGATACAATATGACAATAAAATAATTCCTGAAGATATTAAAGAGCGTCTAAAACTGGTAGATAAGCCTAAAAATATCACTTCGACAGAAGAGTTAGTTGACTATACAGCCAAGCTTGCAGAAACGACTTTTTTAAAGGACGGTTATCACATTCAAACATTAATTTTTTATGATAAACAATTCCATCCAATTGATTTAATCATACACATTTGAAGATCAAGCAGATAAATATATTTTTTGGCGTTATGCAGCTGACAGAGCCAAAATAACAATGCCTATGGCTTCATTTGGATATCAGAGCTATGGCTCAGAAAAGCAAGCATCTACTCCAATAAACCAATACATACAATGCCAATTATAGATGAAAGACTTCAGGTAATTGGAAGTGATTCAATAATAATCAAACATGTATTTCCATGGAAAATAGTTAGAGAAAACGAAGAAAAAAAAACCGACTTTAGAAATATCAAACAGCAGACCTCAAAACATGGACGAAAAACCATATTTCATGCGTTCAGTCTTAAAAGCAATTGGCGGTGATGTAAACACTATGAACAATTGAGTCATAGAACTTCCATTATTCTCCTGAAGATAATAATCGCCAAATAAACCAATACTCAGCTTTACAATATACTAACTAACCGCAGAACGTTATTTCATACAACGTTTTCTGCGGCATATCACAAAACGATTACTCCATAACAGGGACAGCAGGCCACTCAATATCAGGTGCAGTGATGTATCACACGGTTCAGCAACACCCCGATACTTCTTCCAGGCTTCCAGCAACGAGGTTTCTTCCTTCGTTGCAATTTCCAGATCTGCAGCATCCTGAAGCGGCGCAATATGCTCACTGGCTACCTGCATCAGGCTTTTTTTTTGTTTCTTCCGCCTCCCGGATCCGGAACAGTTTTTCTGCTTCCGTATCCTTCACCCAGGCTGTGCCGTTCCACTTCTGATATTCCCCTCCCGGCGATAACCAGGTAAAATTTTCCGGTACGGACCGAGTTCAGAAATAAATAACGCGTCGCCGGAAGCCACGTCATAGACGGTTTTACCCCGATGGTCTTCAACGAGATGCCACGATGCCTCATCACTGTTGAAAAACAGCCACAAAGCCAGCGGAATATCTGGCGGTGCAATATCGGTACTGTTTGCAGGCAGACCGGTATGAGGCGGAATATATGCGTCACCTTCACCATAAATTCATTAGTTCCGGCCAGCAGATTATAAATTTTTATGGGTCCGTGGTTGTTCACTCATTCTGAATGCCATTATGCAAGCCTCACAAATAGTGTAAATGCAATGTTTTTGACGGTGTTTTCCGCGTTACCCGCAGCGTTAACGGTGATGGTGTGTCCGTGTGAACCAATACTGAAAGAATGGGCATGAGCACCGATAACAACGCGGATGCTGGTTGCGCACCCAATACCAACTGTATGCGCATGTGCACCGGCACTCACGGCTGTACCGGACAATGAGTGACTGTGGCTGCCCTGACTGTCCGTTTTCGATAAATAAGCAATACCTGTGTGGCTGGTTCCTTTAACTGTGGATAAACTTCCTGTAATGGTTGCTGTTCCATACTGACTCCAGCCAGAACTGTTCATCCTTAACCACTTGTGTGGGCATGGCACCCGCGGCCCCTGTTGAACCGCTCAGACTGTGAGCATGAGCCCCCGTGTTATTCGTCGATTTTGGTGCCGTAATCGAAACTGCCTGTTGTTTTCGTCCCGTAATCAAACGACGATGTGGTTTTCGTCCCCAAATCCGTACCGGATGCACTGGCACTGTGGGTGTGCGACTTAATTCCATCCTGTTCCTGAGACAATACAGCACGACCGCTGGCGGGTTTCCCTTGATTGTCGCAGCCTCGCATATCAGGAAGCACACCCGATGGATACGCGACAGCAAGTTTTGGGGTAGGCTGATTTGTCAAACGCCTGCCCCTGCATCAGGACGTAAGCCAGACGGAACGATATCTGATGGCCACGGATCGGCGCACCTGCCGGAAAGGCTCGAATTCTCACCGGCCCCAAGGTATTCAAGAACATCTGCAACGGAATTTTTGCCCAGAATATCCCTGCCAACCTGAGTCAGTTCAGTCAGGCTGGCGGCATCATTTTCCGCAAAATACGGTAATTTATTTTTCGCCGTGGAAAGCCCTGCCAGCGCCGTCAGTGTCGCATTCTTCGGTTGTTTACCCGCAAGCGCGTTAGTCATGGTGGTAGCAAAATCTGGATCATTCCCGAGCGCTGCGGCCAGTTCATTCAGCGTATTCAGTGCGTCAGGTGACGCGTCGATAACATCTGCAATCGCGGCCAGTACAAAAGCGGTGTCGCAATCTGGGTATTGTTTGTTCCCCTGAGCGCGGTTGGTGCTGTTGGCGTTCCGGTCAGTGCCGGACTGTCCAGTGGGCTTTTCTGTTCGTTTCATCCATTACCACCTTAACCGCCTTTGCGTTGCAGCAAGCGTTTCAGACGTGCTGTTGGTTGCACTGCTGAGCTGCACTATCCCCTTTCTCGTTGTGTCCGCATCCTCAAGCGCGACAGCTGAAGCTATATCTTCTGCACGTTTTGCCGAATTTTTTTGCACGTATTGCCGCCGCTTCTGCCGCACTCTTGCTCTGCGATGCTGATACCGCACTTCCGCAGCCTCTGTCGCCTTCGTGATGCCGTTGACGCACTCCCCGCCGCCGCTGTTTTTGCGTCTGCCGCGGCAGAGGCGCTCCGTTCCGCTGCTGTTTCAGATGACCTGGCATTCGTCCTCGGACGTTTTTTGCCGCCCTGGCAGAATTTTCTGCCGCCGTTGCCGAGGGAAGCTGCACGACCGGCACTTGATGATGCGTTTCGTTTCTGATGATTTTGCTGCCTCTTTTGAGGCCACCGCATCTCGTGCTGAAGTGGCGGCCTCTGACGCTTTCGTGGCCGCGGTGGAGGCAGACGTTGGCGGCTGATTGTTGTGACGCTGCAGCATTCGTTTCTGACGTTTTTCGCCGCACCGGCACTGGTGGCCGCCGCGTTTTTTGAGGACTCTGCGGCTGCGGCACTTTTTTTCCGCTTCAGTGGCCTTTGCTGATGCCGCTTCTGCGCCGGAGGACGCTTCCTGAGCTGACGATGCAGCCTGTCCGGCGGACGTGCTGGCGGCGCGTGCTGAGTCAGTTGCATCAGTCACAAGGGCCGCGACCTGAGGCAGCTGATGCACTGGCATCGCCGGCTGATTTCTTCGCGTCTGCCGTACTCTGTGCCACCACGGACGCGTTACGCGCCACCTCTTCCACCATCAGTTCAGACGACGCAGCACCTCCGGCCGGGCATCATCCTCCGTCATGGCACAGAGAAAATCATTCAGCGTCCCCGGTTGTGAATCTTCATACAGGTGATGGTCCCGGCGTGCGATGGTGGAAAACCGTCAACCTGCAGGATGACACTGTACTGACCGTACTCCACATCCATGCTGTACGCCCGCTTCATCCGGATTCTCTGAGCCCACCCGTGTTCACCACCACCGTGGTGCTGTTACGTCTGGCTTTCAGCTGAATGTGCAGTTCTGTACCGGTTTTCCTGTGCCGTCTTTTCAGGACTCCTGAAATCTTTACTGCCATATTCACCACACAAAAAAAAGCCCACCGTTTCCGGCGGGCTGTCATAACACTGTGTTTACCTGGCTAATCAGAATTTATAACCGACCCCAACGATGAATCCGTCAGTACGCCAGTCGCCACTGCCGGAGCCTTCATAAGCAATATCAACACGACGGACGCTGGCGGATAATCTGTATACCTGCACTCCACGCCACTGAGGTATGCCGCATTGCACTTTCGTCCCTGGCAGTGGTCGTCTCTTTCATATACCCGGGAGTGATTTCCGTCTTACGGTAATCCATTGTACTGCGGACCACCGACTGTGAGCCACTCCGGCCATGGCGTACGCACTGACCTGCTTACTGATTTGTAAAACCGGTCCGGCCATCACGCTCACATAACGTCCACGCAGGCTCTCATAGTGAAACGTATCCTCCCGGTCAATCACTGTGCTGCTCTTTTTCGACGCGGCGAACCCCAGGGAAGCCATCACCCCCACACTGTCCCGTCAGCTCATAACGGTACTTCACGTTAATCCCTTTCAGATGACTCACACCGGTATCCCCGCCCGACAACGACGGCAATGTACCCGGTTTCCACTTGAAAATAGCCACCGTAAACGTACCATGTCCACCTTCCGCACGGGCCGGAGTGACTGTCACCGCAAGTGCGGCAAAGACAGCAACGGCAATACACACATTACGCATCGTTCACCTCTCACTGTTTTATAATAAAACGCCCGTTCCCGACGAACCTCTGTAACACACTCAGACCACGCTGATGCCCAGCGCCTGTTTCTTAATCACCATAACCTGCACATCGCTGGCAAACGTATACGGCGGAATATCTGCCGAATGCCGTGTGGACGTAAGCGTGAACGTCAGGATCACGTTTCCCCGACCCGCTGGCATGTCAACATACGGGAGAACACCTGTACCGCCTCGTTCGCCGCGCCATCATAAATCACCGCACCGTTCATCAGTACTTTCAGATAACACATCGAATACGTTGTCCTGCCGCTGACAGTACGCTTACTTTCCGCGAAACGTCAGCGGAAGCACCACTATCTGGCGATCAAAAAGGATGGTCATCGGTCACGGTGACAGTACGGGTACCTGACGGCCAGTCCACACTGCTTCACGCTGGCGCGGAAAAGCCGCGCTCGCCGCCTTTACAATGTCCCCGACGATTTTTTCCGCCCTCAGCGTACCGTTTATCGTAGCAGTTTTCAGCTATCGTCACATTACTGAGCGTCCGGAGTTCGCATTCACACTGCCACTGATATCCGCATTTTTAGCGGTCAGCTTTCCGTCCGGTGTCATGGAAAAGGCCGGAGGGATTGCCGCCGCTGGTAATGGTGGGGGCCGTCAGGCGCTTCAGGAACACGTCGTTCATGAATATCTGGTTGCCCTGCGCCACAAACATCGGCGTTTCATTCCCGTTTGCCGGGTCAATAAATGCGATACGATTGGCGGCAACCAGAAACTGGCTCAGTTTTGCCTTCCTCCGTGTCCTCCATGCTGAGGCCAATACCCGCGACATAATGTTTGCCGGTCTTTGGTCTGCTCAATTTTGACAGCCCACATGGCATTCCACTTATCACTGGCATCCTTCCACTCTTTCGAAAACTCCTCCAGTCTGCTGGCGTTATCCTCCGTCAGCTCGACTTTTTCCAGCAGCTCCTTGCAGAGATGGGATTCGGTTATCTTGCCTTTGAAAAAAATCCAGGTAACAATACTATCTCTCTATG
debug.yaml:
Rfam:
details:
compressed: false
default_type: ncRNA
location: None
location: Default
method: infernal
priority: 3
version: release 14.5
The reason this fails is because infernal['qend'] < infernal['qstart']
so when we extract the qseq
, we get an empty sequence. If that's the only result returned across all databases, then the data frame will have the type of the seq
column as a float.
I added one more commit, where the original commits failed on this read:
>SRR11171709.130.1 130 length=6466
CATAGAGAGATAGTATTGTCACCAATGCTGAGATAGCTGAGAGATGGCATATTGCTACGCAAGAATGAAAAGTGATATACTGGAATGTTTTAAAAAGGCAGGTGGGCAAAGTTAAGGATTAATTATCAGGAGTAATTATGCGGAACAGATCATGCCTGGTGTTTACATAGTAATAATTCCTTACGTTATCGTAAGCATTTGCTATCTCCTTTTCCGCCACTACATTCCCTGGTGTTTCTTTTTCAGCTCATAGAGATGGTCTTGGGGCGACATTGTCATCATATGCAGGAACCATGATTGCAATCCTGATTGCTGCCTTGACGTTTCTAATCGGAAGCAGAACGCGCCGACTGGCCAAGATTAGAGAGTATGGGTATATGACATCGGTAGTTATTGTCTATGCCCTTAGTTTTGTTGAGCTTGGAGCTTTGTTTTTCTGCGGGTTATTGCTTCTTTCCAGCATAAGCGGCTACATGATACCCACTATCGCCATCGGCATTGCCTCTGCATCGTTCATTCATATATGCATCCTTGTTTTCCAACTATATAATTTGACCAGAGAACAAGAATAACCCGGCCTCAGCGCCGGGTTTTCTTTGCCTCAACGATCGCCCCCAAAAACACATAACCAATTGTATTTATTGAAAAAATAAATAGATACAACTCACTAAACATAGCAATTCAGATCTCTCACCTACCAAACAATGCCCCCCCTGCAAAAAATAAATTCATATAAAAAACATACAGATAACCATCTGCGGTGATAATTATCTCTGGCGGTGTTGACATAAATACCACTGGCGGTGATACTGAGCACATCAGCAGGACGCACTGACCACCATGAAGGTGACGCTCTTAAAAATTAAGCCCTGAAGAAGGGCAGCATTCAAAGCAGAAGGCTTTGGGGTGTGTGATACGAAACGAAAGCATTGGCCGTAAGTGCGATTCCGGATTAGCTGCCAATGTGCCAATCGCGGGGGGTTTTCGTTCAGGACTACAACTGCCACACACCACCAAAGCTAACTGACAGGAGAATCCAGATGGATGCACCTAAACACGCCGCCGCGAACGTCGCGCAGAGAAACAGTCTCAATGGAAAGCAGCAAATCCCCTGTTGGTTGGGGTAAGCGCAAAACCAGTTAACCGCCCTATTCTCTCGCTGAAATCGCAAACCGAAATCACGAGTAGAAAGCGCACTAAATCCGATAGACCTTACAGTGCTGGCTGAAATACCACAAACGAATTGAAAGCAACCTGCAACGTATTGAGCGCAAGAATCAGCGCACATGGTACAGCAAGCCTGGCGAACGCGGCATAACATGCAGTGGACGCCAGAAAATTAAGGGAAAATCGATTCCTCTTATCTAGTTACTTAGATATTGGCCTTGGCTTTATCTCAATATTATATGGATCATAGCTGGCAACTAATTCAGTCCAGTAAATATCCTCAATAGGGAATAATATATGCTTTCCATTCCATCGGGAAAAAGTTTGTTCAACACACCAAGCTCAATCAACTCACTAATGTATGGGAATTTGTTTTGATGTAACCACATACTTCCTGCCTTCATTAAGGGCTGCGCACAAAACCATAAGATTGCTCTTCTGTAAGGTTTTGAATTACTGATGCGCACTTTATCGTTTTGCATCTTAATGCGTTTCTTAGCTTAAATCGCTTATATCTGGCGCTGGCAATAGCTGATAATCGATGCACATTAATTGCTAGCGAAAATGCAAGAGCAAAGACGAAAACATGCCACACATGAGGAATACCGATTCTCTCATTAACATATTCAGGCCAGTTATCTGGGCTTAAAAGCAGAAGTCCAACCCAGATAACGATCATATACATGGTTCTCTCCAGAGGTTCTTACTGAACACTCGTCCGAGAATAACGAGTGGAGTCCATTTCTATACTCATCAAACTGTAGGGGTTGTAATAGTTTATCCGATTTCTCGCTGTAGGGTACACGAGAACCACCGAGCCTGATGTGGTTAAAAAGACAAGGCAACAATCTTTACTACCGCAATCCACTATTTAAGGTGATATATGGGAAGAAGGAATTTGAAAGAGTTCGAAGAGCATCCTCAGGATGTGATGGAACAATACCAGGACTATCCGTATGACTACGACTATTGATAAAAATCAATGGTGTGGACAATTCAAGCGATGCAATGGATGCAAGCTTGCAATCGAATGCATGGTTAGCCTGAGAAATGTTTCCTGTAAATGGAAGATGGGAAATATGTCGATAAAGGGGCAATACTAACGACGGCAAATGATTGCCAGAGAACTTGGTAAACAGAACAACAAAGCTGCCTGATAGTGGCCTTTATTTTTGGCATAAATAACAGAATAAACACTGCACTGTGTATTCATTCCAACGAGTGAATACACGGGAGCAATGTCGCTCGTAACTAAACAGGAGCCGACTTGTTCTGATTATTGGAAAATCTTCTTTGCCCTCCAGTGTGAGGGCGATTTTTATCTGTGAGGATATGAACAGATGTCAAACATCAAAAAAATACATCATTGATTACGACTGGAAAGCATCAATAGAATTGAAATCGACCATGACGTAATGACAGAGGAAAAACTTCACCAGATTAATAATTTCTGGTCAGACTCTGAATACCGACTCAATAAACACGGCTCTGTATTAAAATGCTGTATTAATCATGCTGGCGCAACATGCTCTGCTTATAGCAATTTCAAGCGACTTAAATGCATATGGTGTTGTGTGTGATGTTCGACTGGAATGATGGAAATGGTCAGGAAGGATGGCCCTCCAATGGATGGTACGAAGGATAGAGAATTACGCGATATCGATACATCAGGAATATTTGATTCAGATGATGATGACTATCAAGGCCGCCTGAGTGCGGTTTTACCGCATACCAATAACGCTTCACTCGAGGCGTTTTTCGTTATGTATAAATAAGGAGCACACCATGCAATATGCCATTGCAGGGTGGCCTGTGTGCTGGCTGCCCTTCCGAATTCTTTACTTAACGAATCACCCGTAAATTACGTGACGGATGGAAACGCCTTATCGACATACTATCAGCAGGAGTACCCAAAGAATGGATCAAACACTTATGGCTATCCAGACTAAATTCACTATCGCCACTTTTATTGGCGATGAAAAAGATGTTTCGTGAAGCCGTCGACGCTTATAAAAAATGGATATTAATACTGAAACTGAGATCAAGCAAAGCATTCACTACCCCCTTTCCTGTTTTCCTAATCAGCCCGGCATTTCGCGCGGCGATATTTTCACAGCTATTTCGGAGTTCAGCCATGAACGCTTATTACAGTCAGGAATCGTGCTTGAGGCTCAGAAGCTGGGCGCGTCACTACCAGCAGCTCGCCCGTGAAGAGAAAGAGGCAAGAACTGGCAGACGACATGGAAAAAGGCCTGCCCCAGCACCCTGTTTGAATCGGCTATGCATCGATCATTTGCAAACGCCACGGGCCATCAAAAAATCAATTACCCGTGCGTTTGATGACGATGTTGAGTTTCAGGAGCGCATGGCAGAACACATCCGGTACATGGTTAGAAACCATTGCTCACCACCAGGTTGATATTGATTCAGTAGGTATAAAAACGAATGAGTACTGCACTCGCAACGCTGGCTGGGAAGCTGGCTGAACGTGTCGGCATGGATTCTGTCGACCCACAGGAAACTGATCACCACTCTTCGCCAGACGGCATTTAAAGGTGATGCCAGCGATGCGCAGTTCATCGCATTACTGATCCGTTGCCAACCAGTACGGCCGTATCCGTGGACGAAAAGTAATTTACGCCTTTCCTGATAAGCGAATGGCATCGTTCCGGTGGGTGGGCGTTTGATGGCTGGTCCCCGCATCATCAATGAAAACCAGCAGTTTGATGGCATGGACTTTGAGCAGGACAATGAATCCTGTACATGCCGGATTTACCGCAAGGACCGTATCATCCGATCTGCGTTGACCGAATGGATGGATGAATGCCGCCGCGAACCATTCAAAACTCGCGAAGGCCAGAGAAATCACGGGGCCGTGGCAGTCCGCATCCCAAACGGATGTTTACGTCATAAAGCCATGATTCAGTGTGCCCGTCTGGCCTTCGAGTTGCTGGTATCTATGACAAGGATGAAGCCGAGCGCATTGTCGAAAATACTGCATACACTGCAGAAACGTCAGCCGGAACGCGACATCACTCCGGTTAACGATGAAACCATGCAGGAGATTAACACTCTGCTGATCGCCCTGGATAAAACATGGGATGACGACTTATTGCCGCTCTGTTTCCCAGATATTTCGCCGCGACATTCGTGCATCGTCAGAACTGACACAGGCCGAAGCAGTAAAAAGCTCTTTGGATTCCTGAAACGAAAGCCGCAGAGCAGAAGGTGCAGCATGACACCGACATTTCCTGCACGTACCGGGATCGATGTGAGAGCTGTCGAACAGGGGGATGATGCGTGGCACAAATTACGGCTCGGCGTCATCACCGCTTCAGAAGTTCACAACGTTATAGCAAAAACCCCGCTCCGGAAAGAAGTGGCCTGACATGAAAAATGTCCTACTTCCACACCCTGCTTGCTGAGGTTTGCACCGGTGTGGCTCCGGAGTTAACGCTAAAGCACTGGCCTGGGGAAAACAGTACGAGAGACGACGCCAGAACCCTTTTTGAATTCAACTTCGGCGTTGAATGTTACTGAATCCCCGATCATCTATCGCGACGAAAGTATGCGTACCGCCTGCTCTCCCGATGGTTTAATGCAGTGACGGCAACGGCCTTGAACTGAAATGCCCGTTTACCTCCCGGGATTTCATGAAGTTCCGGCTCGGTGGTTTCGAGGCCATAAAGTCAGCTTACATGGCCCAGGTGCAGTACAGCATGTTGGGTGACGCGAAAAAATGCCTGGTACTTTGCCAACTATGACCCGCGTATGAAGCGTGAAGGCCTGCATTATGTCGTGATTGAGCGGGATGAAAAGTACATGGCGAGTTTTGGACGAGAATCGTGCCGGAGTTCATCGAAAAAAATGGACGAGGCACTGGCTGAAATTGGTTTTGTATTTGGGGAGCAATGGCGATGACGCATCCTCACGATAATATCCGGGTAGGCCGCAATCACTTTCGTCTACTCCGTTACAAAAGCGAGGCTGGTATTTCCCGGCCTTTCTGTTATCCGAAAATCCACTGAAAGCACAGCGGCTGGCTGAGGATAAATAATAAACGAGGGGCTGTATGCACAAAGCATCTTCTGTGAGTTAAGAACGAGTATCGAGAATGGCCATAGCCTTGCTCATATTGGAATCAGGTTGTGCCAATACCAGTAGAAACAGACGAAGAAATTTCATACGTTAGCCGCATCCCTTTCACAAAAAGCTGGAAAATGATGGTGGCGAAAGCAGAAGCAGATGAGAGAAACCAGGTATGACAACCACGGAATGCATTTTCTGGCAGCGGGCCTTTCATATTCTGTGTGCTTATGCTTGCCGACATGGGACTTGTTCAATGACAACCTCAGCAGGAAAACGCCTTCGCAGCATTGCCCGTCAGGCTAATTCTGAAATCAAAAAAGCAGACAGCAGTTTCCGGATAAAAACGTCGATTGACATTTGCCGTAGCGTACTGAAAGAAGCACCGCGAACGGTAACGCTGATGGGATTCACACCGACTCATTTAAGCCTGGCAATCGGCATGTTAAACTGCGTCTTTAAGGAAACGATGAACATGAAAGCAAAAATCATACAGGGAGCTACAGGCTCCTTTTTTATTTTCGCATTCACCCTCAAGCGTATTAACCAACAGTTCAGGGCTTAATGAAAGATGGCAGACATCATTGATTCAGCATCAGAAAATAGAAGAATTACAGCGCAACACAGCAATAAAAAATGCGCCGCCTGAACCACCAGGCTATAATCTGCCACTCATTGTTGTGAGTGTGGCGATCCGATAGATGAACGAAGAACGCCTGTCGTTCAGGGTTGTCGGACTTGTGCAAGTTGCCAGGGGAGGATCTGGAACTTATCAGTAAACAGAGAGGTTCGAAGTGTAGCGAAATTAACTCTCAGGCACTGCGTGAAGCGGCAGAGCAGGCAATGCATGACGACTGGGGATTTGACGCAGGACCTTTTCCATGATTGGTAACAACATCGATTGTGCTGGAACTGCTGGATGACGGGAAAGAACCAGCAATACAGATCAAACGCCGCGACCAGGAGAACGAGGATATTGCGCTAAACAGTAGGGAAACTGCGTGTTGAGCTTGAAACAGCAAAAAAATCAAAACTCAACGAGCAGCGGTGAGTATTACGAAGGTGTTATCTCGGATGGGAGTAAGCGTATTGCTAAACTGAAAAGGCAACGAAGTCCGTGAAGACGGAAACCAGTTTCTTGTTGTTCGCCATCCCTGGGGAAAAGACTCCTGTTATCAAGCACATGCACTGGTACAATACTATCTCTCTATGG
Weirdly, after swapping the columns, pandas changes the types of the qstart and qend columns from an integer to a float!
This works now in my testing, including on the case where it caused a crash before, so I'm merging this.
I am leaving a note here that I don't understand the further processing of qstart
and qend
enough to know that swapping them here doesn't cause a downstream problem in how things are displayed and output. My testing seems to show that this doesn't change anything, so I think further processing is fixing the coords to be safe whether they are reversed or not because it uses sframe
to determine the strand.
This makes sure we don't get empty values here: https://github.com/barricklab/pLannotate/blob/daabf1a63be79c43ff6166c76403d358b1d53da8/plannotate/annotate.py#L385