labgem / PPanGGOLiN

Build a partitioned pangenome graph from microbial genomes
https://ppanggolin.readthedocs.io
Other
242 stars 29 forks source link

Read correctly coordinates on multiple lines for GBFF #240

Closed jpjarnoux closed 5 months ago

jpjarnoux commented 5 months ago

Genes coordinates can be written on multiple lines in GBFF files. For instance:

     misc_feature    complement(order(23276..23344,23180..23233,23108..23167,
                     22982..23050,22853..22921))
                     /gene="dsbI"
                     /locus_tag="Cj0017c"
                     /inference="protein motif:TMHMM:2.0"
     misc_feature    complement(order(25348..25416,24661..24729))
                     /locus_tag="Cj0019c"
                     /inference="protein motif:TMHMM:2.0"
     misc_feature    complement(order(33433..33492,33331..33399,33205..33273,
                     33076..33144,32908..32976,32779..32847,32653..32721,
                     32548..32616,32437..32505,32332..32400,32251..32319))
                     /locus_tag="Cj0025c"
                     /inference="protein motif:TMHMM:2.0"

This PR changes the GBFF parsing to read the coordinates correctly and should (maybe) fix this issue #195