labgem / PPanGGOLiN

Build a partitioned pangenome graph from microbial genomes
https://ppanggolin.readthedocs.io
Other
242 stars 29 forks source link

Fix empty metadata tag in annotation file #287

Closed JeanMainguy closed 1 month ago

JeanMainguy commented 1 month ago

This PR addresses an issue where genome tags in GBFF files are empty. These tags were being included as genome metadata, but their zero length caused problems when writing to the pangenome file.

To fix this, I’ve implemented a filter to ignore any tags with empty values. Instead a debug log will now be generated to provide info into what’s being filtered out. I also improved the error message when dealing with wrong metadata.

This should resolve issue #285.

Example of an Empty Tag Encountered

One example of an empty tag is genome_md5, which looks like this:

LOCUS       ctg.s2.38.arrow        14924 bp    DNA     linear   UNK 
DEFINITION  Fusobacterium sp. nucleatum KCOM1001
ACCESSION   ctg.s2.38.arrow
KEYWORDS    .
SOURCE      Fusobacterium sp. nucleatum KCOM1001.
  ORGANISM  Fusobacterium sp. nucleatum KCOM1001
            Bacteria.
FEATURES             Location/Qualifiers
     source          1..14924
                     /mol_type="genomic DNA"
                     /db_xref="taxon:6666666"
                     /genome_md5=""
                     /project="mzepedar_6666666"
                     /genome_id="6666666.722438"
                     /organism="Fusobacterium sp. nucleatum KCOM1001"