mmcguffi / pLannotate

Webserver and command line tool for annotating engineered plasmids
GNU General Public License v3.0
97 stars 20 forks source link

Infernal output could align to the negative strand #27

Closed nh13 closed 1 year ago

nh13 commented 1 year ago

This makes sure we don't get empty values here: https://github.com/barricklab/pLannotate/blob/daabf1a63be79c43ff6166c76403d358b1d53da8/plannotate/annotate.py#L385

jeffreybarrick commented 1 year ago

@nh13 Some questions about this PR...

1) It doesn't seem like this change is necessary? Do you have an example? I see predictions from Infernal being shown on both strands in the webserver version. For example, I am using Addgene #85494 as example input file for testing.

2) I get a crash when I try to run Addgene #193140 with your code.

Please advise!

nh13 commented 1 year ago

My apologies, I should have included the test case from the start

command

plannotate batch -i test.fasta -y debug.yaml -o tmp/ -f test -s '' -c -d

test.fasta

>SRR11171709.78.1 78 length=7306
CATAGAGAGATAGTATTGTAACCTGACAAAGGTTAAATTAGAAAAGAGCTTTATGAAAATAATCTCCGAGATAAAAACAAACCCGCTAGAAACATATTACGATGACATGCTTGTTTCATCAGGTGAAAAATTCCATTTTGTGGTGATATAGGACAGACAAAAAATATAGATCATTTTTCTTCCTATTGCACATTATCCTGATTTTCGGTGATGCCTATTATTTAGTTCCATCGTGCCGCGACTGCAATATGGGAGAGAAAGGTCAAGTTTTCGCAGTAGATGAGGTAACACCAGCGATTCATCCCTATATCGACAAGGACATTTTTTTTCGTGAGCAATGGGTATATGCAAATTTCGTTTCCGGAACTCCGGGTGCTATCAGTTTTTATGTTGAATGCCCGGCGAACTGGAGGCAGGAAGACAAACACAGAGCTCTTCATCATTTCAAGCTATTAAATATTGCTAACAGGTATCGTTTGGAGGCAGGGAAGCACTTGAGTGAAGTGATTACTCAAAGAAACTCTTTCGTAAAAGTTATAAGGAAATATAGTTCAACCGCAACGTTTCAGCAGCTACAGTCAGAATTTATTGAAGCAAATCTGAAACCTATTATAGATTTGAATGACTTCCCAATTATTGGAAAAGAGTTATGTATCAGTGCCTAGCAAACTCGGAAGATTTTTTCAGAGGGATCTAGAATATGATGAAAGATAGAAAATTACGACGCTTATCGGAAGTGAACGAATACTTTTTATATGAGGAGGGCTGTTTTTACAAAATCCGGTAGTAAACTTGCTAACCAATTCCTAGGCAGGTCATTGGGCCAACAGTGGCATGCACCGAGAAGGACGTTTGTAATGTCCGCTCCGGCACATAGCAGTCCTAGGGACAGTGGCGTACAGTCATAGATGGTCGGTGGGAGGTGGTACAAATTCTCTCATGCAAAAAATATGTAAAAATCGGTAGGCAAACTGGAAAATCATGCAACACCCGCACTATCGGAAGTTCACCAGCCAGCCCGCAGCACGTTCCTGCATACGACGTGGTCTGCGGCTCTACCATATCTCCTATGAGCAACGTGTTAGCAGAGCCAAGCCACAACTCTAATTTTAATACATAATGAATGATAATAATAATATTAAAAATTTCCTGTGTAACTAATTTACTATATGGTTTCTGATAAGAATCATTGCAAAGATCAAACAACTTGTATTACATTGACAGTTAAGCAGTTAATTTTATCACCTCTAAAAATATATCAGCATCTAGCAATGCAACCTATCAAAATGGAGAGTTTTATGACTAAAAAAACCATGGGAAAGAAGACTTAAAGATTTATTCGCACTTGCTCAAAATGCTGCATTGATACATATTTTGACCCTGAATTATTTCGCTTGAATTTGAATCAATTCCTCCAAACCGCAAGAACAGTAACATTTTATTATTCAAAAAAACAAAAACCAGATTATAGATATGACATTTGGTATAACAATAATGTTATTGAAAAAATGGAAAAATGATCCATTAATGGCTTGGGCTAAAAATTTCTCGCAATACGAGTAGAAAAACAAGGCGATTTAGAAATGTATAGCGAGGCAAAGGCTACTCTTATTTCATCTTACATTGAAGAAAATGACATTGAGTTTATTACAAATGAAAGTATGTTAAACATTGGTATAAAAAAGTTAGTCAGACTTGCACAAAAGAAATTACCTTCATATTTAACTGAATCATCTCTTATTAATCAGAAAGACGATGGGTCGCTAATACGCTAAAAGATTACGAATTATTACATGCCTTACTATAATCTATGGCAGAATGTATAACTGCTGTAACTCTCTTGGCATACAAATACAATCCAATGGGTGACGATGTGATTTCGCCAACATCATTCGACTCTTTATTTGATGAAGCCAGGAGAATAACTTATTTTAAAATTAAAAGATTACTCCATAAGCAAATTGTCATTTAGCATGATACAATATGACAATAAAATAATTCCTGAAGATATTAAAGAGCGTCTAAAACTGGTAGATAAGCCTAAAAATATCACTTCGACAGAAGAGTTAGTTGACTATACAGCCAAGCTTGCAGAAACGACTTTTTTAAAGGACGGTTATCACATTCAAACATTAATTTTTTATGATAAACAATTCCATCCAATTGATTTAATCATACACATTTGAAGATCAAGCAGATAAATATATTTTTTGGCGTTATGCAGCTGACAGAGCCAAAATAACAATGCCTATGGCTTCATTTGGATATCAGAGCTATGGCTCAGAAAAGCAAGCATCTACTCCAATAAACCAATACATACAATGCCAATTATAGATGAAAGACTTCAGGTAATTGGAAGTGATTCAATAATAATCAAACATGTATTTCCATGGAAAATAGTTAGAGAAAACGAAGAAAAAAAAACCGACTTTAGAAATATCAAACAGCAGACCTCAAAACATGGACGAAAAACCATATTTCATGCGTTCAGTCTTAAAAGCAATTGGCGGTGATGTAAACACTATGAACAATTGAGTCATAGAACTTCCATTATTCTCCTGAAGATAATAATCGCCAAATAAACCAATACTCAGCTTTACAATATACTAACTAACCGCAGAACGTTATTTCATACAACGTTTTCTGCGGCATATCACAAAACGATTACTCCATAACAGGGACAGCAGGCCACTCAATATCAGGTGCAGTGATGTATCACACGGTTCAGCAACACCCCGATACTTCTTCCAGGCTTCCAGCAACGAGGTTTCTTCCTTCGTTGCAATTTCCAGATCTGCAGCATCCTGAAGCGGCGCAATATGCTCACTGGCTACCTGCATCAGGCTTTTTTTTTGTTTCTTCCGCCTCCCGGATCCGGAACAGTTTTTCTGCTTCCGTATCCTTCACCCAGGCTGTGCCGTTCCACTTCTGATATTCCCCTCCCGGCGATAACCAGGTAAAATTTTCCGGTACGGACCGAGTTCAGAAATAAATAACGCGTCGCCGGAAGCCACGTCATAGACGGTTTTACCCCGATGGTCTTCAACGAGATGCCACGATGCCTCATCACTGTTGAAAAACAGCCACAAAGCCAGCGGAATATCTGGCGGTGCAATATCGGTACTGTTTGCAGGCAGACCGGTATGAGGCGGAATATATGCGTCACCTTCACCATAAATTCATTAGTTCCGGCCAGCAGATTATAAATTTTTATGGGTCCGTGGTTGTTCACTCATTCTGAATGCCATTATGCAAGCCTCACAAATAGTGTAAATGCAATGTTTTTGACGGTGTTTTCCGCGTTACCCGCAGCGTTAACGGTGATGGTGTGTCCGTGTGAACCAATACTGAAAGAATGGGCATGAGCACCGATAACAACGCGGATGCTGGTTGCGCACCCAATACCAACTGTATGCGCATGTGCACCGGCACTCACGGCTGTACCGGACAATGAGTGACTGTGGCTGCCCTGACTGTCCGTTTTCGATAAATAAGCAATACCTGTGTGGCTGGTTCCTTTAACTGTGGATAAACTTCCTGTAATGGTTGCTGTTCCATACTGACTCCAGCCAGAACTGTTCATCCTTAACCACTTGTGTGGGCATGGCACCCGCGGCCCCTGTTGAACCGCTCAGACTGTGAGCATGAGCCCCCGTGTTATTCGTCGATTTTGGTGCCGTAATCGAAACTGCCTGTTGTTTTCGTCCCGTAATCAAACGACGATGTGGTTTTCGTCCCCAAATCCGTACCGGATGCACTGGCACTGTGGGTGTGCGACTTAATTCCATCCTGTTCCTGAGACAATACAGCACGACCGCTGGCGGGTTTCCCTTGATTGTCGCAGCCTCGCATATCAGGAAGCACACCCGATGGATACGCGACAGCAAGTTTTGGGGTAGGCTGATTTGTCAAACGCCTGCCCCTGCATCAGGACGTAAGCCAGACGGAACGATATCTGATGGCCACGGATCGGCGCACCTGCCGGAAAGGCTCGAATTCTCACCGGCCCCAAGGTATTCAAGAACATCTGCAACGGAATTTTTGCCCAGAATATCCCTGCCAACCTGAGTCAGTTCAGTCAGGCTGGCGGCATCATTTTCCGCAAAATACGGTAATTTATTTTTCGCCGTGGAAAGCCCTGCCAGCGCCGTCAGTGTCGCATTCTTCGGTTGTTTACCCGCAAGCGCGTTAGTCATGGTGGTAGCAAAATCTGGATCATTCCCGAGCGCTGCGGCCAGTTCATTCAGCGTATTCAGTGCGTCAGGTGACGCGTCGATAACATCTGCAATCGCGGCCAGTACAAAAGCGGTGTCGCAATCTGGGTATTGTTTGTTCCCCTGAGCGCGGTTGGTGCTGTTGGCGTTCCGGTCAGTGCCGGACTGTCCAGTGGGCTTTTCTGTTCGTTTCATCCATTACCACCTTAACCGCCTTTGCGTTGCAGCAAGCGTTTCAGACGTGCTGTTGGTTGCACTGCTGAGCTGCACTATCCCCTTTCTCGTTGTGTCCGCATCCTCAAGCGCGACAGCTGAAGCTATATCTTCTGCACGTTTTGCCGAATTTTTTTGCACGTATTGCCGCCGCTTCTGCCGCACTCTTGCTCTGCGATGCTGATACCGCACTTCCGCAGCCTCTGTCGCCTTCGTGATGCCGTTGACGCACTCCCCGCCGCCGCTGTTTTTGCGTCTGCCGCGGCAGAGGCGCTCCGTTCCGCTGCTGTTTCAGATGACCTGGCATTCGTCCTCGGACGTTTTTTGCCGCCCTGGCAGAATTTTCTGCCGCCGTTGCCGAGGGAAGCTGCACGACCGGCACTTGATGATGCGTTTCGTTTCTGATGATTTTGCTGCCTCTTTTGAGGCCACCGCATCTCGTGCTGAAGTGGCGGCCTCTGACGCTTTCGTGGCCGCGGTGGAGGCAGACGTTGGCGGCTGATTGTTGTGACGCTGCAGCATTCGTTTCTGACGTTTTTCGCCGCACCGGCACTGGTGGCCGCCGCGTTTTTTGAGGACTCTGCGGCTGCGGCACTTTTTTTCCGCTTCAGTGGCCTTTGCTGATGCCGCTTCTGCGCCGGAGGACGCTTCCTGAGCTGACGATGCAGCCTGTCCGGCGGACGTGCTGGCGGCGCGTGCTGAGTCAGTTGCATCAGTCACAAGGGCCGCGACCTGAGGCAGCTGATGCACTGGCATCGCCGGCTGATTTCTTCGCGTCTGCCGTACTCTGTGCCACCACGGACGCGTTACGCGCCACCTCTTCCACCATCAGTTCAGACGACGCAGCACCTCCGGCCGGGCATCATCCTCCGTCATGGCACAGAGAAAATCATTCAGCGTCCCCGGTTGTGAATCTTCATACAGGTGATGGTCCCGGCGTGCGATGGTGGAAAACCGTCAACCTGCAGGATGACACTGTACTGACCGTACTCCACATCCATGCTGTACGCCCGCTTCATCCGGATTCTCTGAGCCCACCCGTGTTCACCACCACCGTGGTGCTGTTACGTCTGGCTTTCAGCTGAATGTGCAGTTCTGTACCGGTTTTCCTGTGCCGTCTTTTCAGGACTCCTGAAATCTTTACTGCCATATTCACCACACAAAAAAAAGCCCACCGTTTCCGGCGGGCTGTCATAACACTGTGTTTACCTGGCTAATCAGAATTTATAACCGACCCCAACGATGAATCCGTCAGTACGCCAGTCGCCACTGCCGGAGCCTTCATAAGCAATATCAACACGACGGACGCTGGCGGATAATCTGTATACCTGCACTCCACGCCACTGAGGTATGCCGCATTGCACTTTCGTCCCTGGCAGTGGTCGTCTCTTTCATATACCCGGGAGTGATTTCCGTCTTACGGTAATCCATTGTACTGCGGACCACCGACTGTGAGCCACTCCGGCCATGGCGTACGCACTGACCTGCTTACTGATTTGTAAAACCGGTCCGGCCATCACGCTCACATAACGTCCACGCAGGCTCTCATAGTGAAACGTATCCTCCCGGTCAATCACTGTGCTGCTCTTTTTCGACGCGGCGAACCCCAGGGAAGCCATCACCCCCACACTGTCCCGTCAGCTCATAACGGTACTTCACGTTAATCCCTTTCAGATGACTCACACCGGTATCCCCGCCCGACAACGACGGCAATGTACCCGGTTTCCACTTGAAAATAGCCACCGTAAACGTACCATGTCCACCTTCCGCACGGGCCGGAGTGACTGTCACCGCAAGTGCGGCAAAGACAGCAACGGCAATACACACATTACGCATCGTTCACCTCTCACTGTTTTATAATAAAACGCCCGTTCCCGACGAACCTCTGTAACACACTCAGACCACGCTGATGCCCAGCGCCTGTTTCTTAATCACCATAACCTGCACATCGCTGGCAAACGTATACGGCGGAATATCTGCCGAATGCCGTGTGGACGTAAGCGTGAACGTCAGGATCACGTTTCCCCGACCCGCTGGCATGTCAACATACGGGAGAACACCTGTACCGCCTCGTTCGCCGCGCCATCATAAATCACCGCACCGTTCATCAGTACTTTCAGATAACACATCGAATACGTTGTCCTGCCGCTGACAGTACGCTTACTTTCCGCGAAACGTCAGCGGAAGCACCACTATCTGGCGATCAAAAAGGATGGTCATCGGTCACGGTGACAGTACGGGTACCTGACGGCCAGTCCACACTGCTTCACGCTGGCGCGGAAAAGCCGCGCTCGCCGCCTTTACAATGTCCCCGACGATTTTTTCCGCCCTCAGCGTACCGTTTATCGTAGCAGTTTTCAGCTATCGTCACATTACTGAGCGTCCGGAGTTCGCATTCACACTGCCACTGATATCCGCATTTTTAGCGGTCAGCTTTCCGTCCGGTGTCATGGAAAAGGCCGGAGGGATTGCCGCCGCTGGTAATGGTGGGGGCCGTCAGGCGCTTCAGGAACACGTCGTTCATGAATATCTGGTTGCCCTGCGCCACAAACATCGGCGTTTCATTCCCGTTTGCCGGGTCAATAAATGCGATACGATTGGCGGCAACCAGAAACTGGCTCAGTTTTGCCTTCCTCCGTGTCCTCCATGCTGAGGCCAATACCCGCGACATAATGTTTGCCGGTCTTTGGTCTGCTCAATTTTGACAGCCCACATGGCATTCCACTTATCACTGGCATCCTTCCACTCTTTCGAAAACTCCTCCAGTCTGCTGGCGTTATCCTCCGTCAGCTCGACTTTTTCCAGCAGCTCCTTGCAGAGATGGGATTCGGTTATCTTGCCTTTGAAAAAAATCCAGGTAACAATACTATCTCTCTATG

debug.yaml:

Rfam:
  details:
    compressed: false
    default_type: ncRNA
    location: None
  location: Default
  method: infernal
  priority: 3
  version: release 14.5
nh13 commented 1 year ago

The reason this fails is because infernal['qend'] < infernal['qstart'] so when we extract the qseq, we get an empty sequence. If that's the only result returned across all databases, then the data frame will have the type of the seq column as a float.

nh13 commented 1 year ago

I added one more commit, where the original commits failed on this read:

>SRR11171709.130.1 130 length=6466
CATAGAGAGATAGTATTGTCACCAATGCTGAGATAGCTGAGAGATGGCATATTGCTACGCAAGAATGAAAAGTGATATACTGGAATGTTTTAAAAAGGCAGGTGGGCAAAGTTAAGGATTAATTATCAGGAGTAATTATGCGGAACAGATCATGCCTGGTGTTTACATAGTAATAATTCCTTACGTTATCGTAAGCATTTGCTATCTCCTTTTCCGCCACTACATTCCCTGGTGTTTCTTTTTCAGCTCATAGAGATGGTCTTGGGGCGACATTGTCATCATATGCAGGAACCATGATTGCAATCCTGATTGCTGCCTTGACGTTTCTAATCGGAAGCAGAACGCGCCGACTGGCCAAGATTAGAGAGTATGGGTATATGACATCGGTAGTTATTGTCTATGCCCTTAGTTTTGTTGAGCTTGGAGCTTTGTTTTTCTGCGGGTTATTGCTTCTTTCCAGCATAAGCGGCTACATGATACCCACTATCGCCATCGGCATTGCCTCTGCATCGTTCATTCATATATGCATCCTTGTTTTCCAACTATATAATTTGACCAGAGAACAAGAATAACCCGGCCTCAGCGCCGGGTTTTCTTTGCCTCAACGATCGCCCCCAAAAACACATAACCAATTGTATTTATTGAAAAAATAAATAGATACAACTCACTAAACATAGCAATTCAGATCTCTCACCTACCAAACAATGCCCCCCCTGCAAAAAATAAATTCATATAAAAAACATACAGATAACCATCTGCGGTGATAATTATCTCTGGCGGTGTTGACATAAATACCACTGGCGGTGATACTGAGCACATCAGCAGGACGCACTGACCACCATGAAGGTGACGCTCTTAAAAATTAAGCCCTGAAGAAGGGCAGCATTCAAAGCAGAAGGCTTTGGGGTGTGTGATACGAAACGAAAGCATTGGCCGTAAGTGCGATTCCGGATTAGCTGCCAATGTGCCAATCGCGGGGGGTTTTCGTTCAGGACTACAACTGCCACACACCACCAAAGCTAACTGACAGGAGAATCCAGATGGATGCACCTAAACACGCCGCCGCGAACGTCGCGCAGAGAAACAGTCTCAATGGAAAGCAGCAAATCCCCTGTTGGTTGGGGTAAGCGCAAAACCAGTTAACCGCCCTATTCTCTCGCTGAAATCGCAAACCGAAATCACGAGTAGAAAGCGCACTAAATCCGATAGACCTTACAGTGCTGGCTGAAATACCACAAACGAATTGAAAGCAACCTGCAACGTATTGAGCGCAAGAATCAGCGCACATGGTACAGCAAGCCTGGCGAACGCGGCATAACATGCAGTGGACGCCAGAAAATTAAGGGAAAATCGATTCCTCTTATCTAGTTACTTAGATATTGGCCTTGGCTTTATCTCAATATTATATGGATCATAGCTGGCAACTAATTCAGTCCAGTAAATATCCTCAATAGGGAATAATATATGCTTTCCATTCCATCGGGAAAAAGTTTGTTCAACACACCAAGCTCAATCAACTCACTAATGTATGGGAATTTGTTTTGATGTAACCACATACTTCCTGCCTTCATTAAGGGCTGCGCACAAAACCATAAGATTGCTCTTCTGTAAGGTTTTGAATTACTGATGCGCACTTTATCGTTTTGCATCTTAATGCGTTTCTTAGCTTAAATCGCTTATATCTGGCGCTGGCAATAGCTGATAATCGATGCACATTAATTGCTAGCGAAAATGCAAGAGCAAAGACGAAAACATGCCACACATGAGGAATACCGATTCTCTCATTAACATATTCAGGCCAGTTATCTGGGCTTAAAAGCAGAAGTCCAACCCAGATAACGATCATATACATGGTTCTCTCCAGAGGTTCTTACTGAACACTCGTCCGAGAATAACGAGTGGAGTCCATTTCTATACTCATCAAACTGTAGGGGTTGTAATAGTTTATCCGATTTCTCGCTGTAGGGTACACGAGAACCACCGAGCCTGATGTGGTTAAAAAGACAAGGCAACAATCTTTACTACCGCAATCCACTATTTAAGGTGATATATGGGAAGAAGGAATTTGAAAGAGTTCGAAGAGCATCCTCAGGATGTGATGGAACAATACCAGGACTATCCGTATGACTACGACTATTGATAAAAATCAATGGTGTGGACAATTCAAGCGATGCAATGGATGCAAGCTTGCAATCGAATGCATGGTTAGCCTGAGAAATGTTTCCTGTAAATGGAAGATGGGAAATATGTCGATAAAGGGGCAATACTAACGACGGCAAATGATTGCCAGAGAACTTGGTAAACAGAACAACAAAGCTGCCTGATAGTGGCCTTTATTTTTGGCATAAATAACAGAATAAACACTGCACTGTGTATTCATTCCAACGAGTGAATACACGGGAGCAATGTCGCTCGTAACTAAACAGGAGCCGACTTGTTCTGATTATTGGAAAATCTTCTTTGCCCTCCAGTGTGAGGGCGATTTTTATCTGTGAGGATATGAACAGATGTCAAACATCAAAAAAATACATCATTGATTACGACTGGAAAGCATCAATAGAATTGAAATCGACCATGACGTAATGACAGAGGAAAAACTTCACCAGATTAATAATTTCTGGTCAGACTCTGAATACCGACTCAATAAACACGGCTCTGTATTAAAATGCTGTATTAATCATGCTGGCGCAACATGCTCTGCTTATAGCAATTTCAAGCGACTTAAATGCATATGGTGTTGTGTGTGATGTTCGACTGGAATGATGGAAATGGTCAGGAAGGATGGCCCTCCAATGGATGGTACGAAGGATAGAGAATTACGCGATATCGATACATCAGGAATATTTGATTCAGATGATGATGACTATCAAGGCCGCCTGAGTGCGGTTTTACCGCATACCAATAACGCTTCACTCGAGGCGTTTTTCGTTATGTATAAATAAGGAGCACACCATGCAATATGCCATTGCAGGGTGGCCTGTGTGCTGGCTGCCCTTCCGAATTCTTTACTTAACGAATCACCCGTAAATTACGTGACGGATGGAAACGCCTTATCGACATACTATCAGCAGGAGTACCCAAAGAATGGATCAAACACTTATGGCTATCCAGACTAAATTCACTATCGCCACTTTTATTGGCGATGAAAAAGATGTTTCGTGAAGCCGTCGACGCTTATAAAAAATGGATATTAATACTGAAACTGAGATCAAGCAAAGCATTCACTACCCCCTTTCCTGTTTTCCTAATCAGCCCGGCATTTCGCGCGGCGATATTTTCACAGCTATTTCGGAGTTCAGCCATGAACGCTTATTACAGTCAGGAATCGTGCTTGAGGCTCAGAAGCTGGGCGCGTCACTACCAGCAGCTCGCCCGTGAAGAGAAAGAGGCAAGAACTGGCAGACGACATGGAAAAAGGCCTGCCCCAGCACCCTGTTTGAATCGGCTATGCATCGATCATTTGCAAACGCCACGGGCCATCAAAAAATCAATTACCCGTGCGTTTGATGACGATGTTGAGTTTCAGGAGCGCATGGCAGAACACATCCGGTACATGGTTAGAAACCATTGCTCACCACCAGGTTGATATTGATTCAGTAGGTATAAAAACGAATGAGTACTGCACTCGCAACGCTGGCTGGGAAGCTGGCTGAACGTGTCGGCATGGATTCTGTCGACCCACAGGAAACTGATCACCACTCTTCGCCAGACGGCATTTAAAGGTGATGCCAGCGATGCGCAGTTCATCGCATTACTGATCCGTTGCCAACCAGTACGGCCGTATCCGTGGACGAAAAGTAATTTACGCCTTTCCTGATAAGCGAATGGCATCGTTCCGGTGGGTGGGCGTTTGATGGCTGGTCCCCGCATCATCAATGAAAACCAGCAGTTTGATGGCATGGACTTTGAGCAGGACAATGAATCCTGTACATGCCGGATTTACCGCAAGGACCGTATCATCCGATCTGCGTTGACCGAATGGATGGATGAATGCCGCCGCGAACCATTCAAAACTCGCGAAGGCCAGAGAAATCACGGGGCCGTGGCAGTCCGCATCCCAAACGGATGTTTACGTCATAAAGCCATGATTCAGTGTGCCCGTCTGGCCTTCGAGTTGCTGGTATCTATGACAAGGATGAAGCCGAGCGCATTGTCGAAAATACTGCATACACTGCAGAAACGTCAGCCGGAACGCGACATCACTCCGGTTAACGATGAAACCATGCAGGAGATTAACACTCTGCTGATCGCCCTGGATAAAACATGGGATGACGACTTATTGCCGCTCTGTTTCCCAGATATTTCGCCGCGACATTCGTGCATCGTCAGAACTGACACAGGCCGAAGCAGTAAAAAGCTCTTTGGATTCCTGAAACGAAAGCCGCAGAGCAGAAGGTGCAGCATGACACCGACATTTCCTGCACGTACCGGGATCGATGTGAGAGCTGTCGAACAGGGGGATGATGCGTGGCACAAATTACGGCTCGGCGTCATCACCGCTTCAGAAGTTCACAACGTTATAGCAAAAACCCCGCTCCGGAAAGAAGTGGCCTGACATGAAAAATGTCCTACTTCCACACCCTGCTTGCTGAGGTTTGCACCGGTGTGGCTCCGGAGTTAACGCTAAAGCACTGGCCTGGGGAAAACAGTACGAGAGACGACGCCAGAACCCTTTTTGAATTCAACTTCGGCGTTGAATGTTACTGAATCCCCGATCATCTATCGCGACGAAAGTATGCGTACCGCCTGCTCTCCCGATGGTTTAATGCAGTGACGGCAACGGCCTTGAACTGAAATGCCCGTTTACCTCCCGGGATTTCATGAAGTTCCGGCTCGGTGGTTTCGAGGCCATAAAGTCAGCTTACATGGCCCAGGTGCAGTACAGCATGTTGGGTGACGCGAAAAAATGCCTGGTACTTTGCCAACTATGACCCGCGTATGAAGCGTGAAGGCCTGCATTATGTCGTGATTGAGCGGGATGAAAAGTACATGGCGAGTTTTGGACGAGAATCGTGCCGGAGTTCATCGAAAAAAATGGACGAGGCACTGGCTGAAATTGGTTTTGTATTTGGGGAGCAATGGCGATGACGCATCCTCACGATAATATCCGGGTAGGCCGCAATCACTTTCGTCTACTCCGTTACAAAAGCGAGGCTGGTATTTCCCGGCCTTTCTGTTATCCGAAAATCCACTGAAAGCACAGCGGCTGGCTGAGGATAAATAATAAACGAGGGGCTGTATGCACAAAGCATCTTCTGTGAGTTAAGAACGAGTATCGAGAATGGCCATAGCCTTGCTCATATTGGAATCAGGTTGTGCCAATACCAGTAGAAACAGACGAAGAAATTTCATACGTTAGCCGCATCCCTTTCACAAAAAGCTGGAAAATGATGGTGGCGAAAGCAGAAGCAGATGAGAGAAACCAGGTATGACAACCACGGAATGCATTTTCTGGCAGCGGGCCTTTCATATTCTGTGTGCTTATGCTTGCCGACATGGGACTTGTTCAATGACAACCTCAGCAGGAAAACGCCTTCGCAGCATTGCCCGTCAGGCTAATTCTGAAATCAAAAAAGCAGACAGCAGTTTCCGGATAAAAACGTCGATTGACATTTGCCGTAGCGTACTGAAAGAAGCACCGCGAACGGTAACGCTGATGGGATTCACACCGACTCATTTAAGCCTGGCAATCGGCATGTTAAACTGCGTCTTTAAGGAAACGATGAACATGAAAGCAAAAATCATACAGGGAGCTACAGGCTCCTTTTTTATTTTCGCATTCACCCTCAAGCGTATTAACCAACAGTTCAGGGCTTAATGAAAGATGGCAGACATCATTGATTCAGCATCAGAAAATAGAAGAATTACAGCGCAACACAGCAATAAAAAATGCGCCGCCTGAACCACCAGGCTATAATCTGCCACTCATTGTTGTGAGTGTGGCGATCCGATAGATGAACGAAGAACGCCTGTCGTTCAGGGTTGTCGGACTTGTGCAAGTTGCCAGGGGAGGATCTGGAACTTATCAGTAAACAGAGAGGTTCGAAGTGTAGCGAAATTAACTCTCAGGCACTGCGTGAAGCGGCAGAGCAGGCAATGCATGACGACTGGGGATTTGACGCAGGACCTTTTCCATGATTGGTAACAACATCGATTGTGCTGGAACTGCTGGATGACGGGAAAGAACCAGCAATACAGATCAAACGCCGCGACCAGGAGAACGAGGATATTGCGCTAAACAGTAGGGAAACTGCGTGTTGAGCTTGAAACAGCAAAAAAATCAAAACTCAACGAGCAGCGGTGAGTATTACGAAGGTGTTATCTCGGATGGGAGTAAGCGTATTGCTAAACTGAAAAGGCAACGAAGTCCGTGAAGACGGAAACCAGTTTCTTGTTGTTCGCCATCCCTGGGGAAAAGACTCCTGTTATCAAGCACATGCACTGGTACAATACTATCTCTCTATGG

Weirdly, after swapping the columns, pandas changes the types of the qstart and qend columns from an integer to a float!

jeffreybarrick commented 1 year ago

This works now in my testing, including on the case where it caused a crash before, so I'm merging this.

I am leaving a note here that I don't understand the further processing of qstart and qend enough to know that swapping them here doesn't cause a downstream problem in how things are displayed and output. My testing seems to show that this doesn't change anything, so I think further processing is fixing the coords to be safe whether they are reversed or not because it uses sframe to determine the strand.