This PR addresses an issue where genome tags in GBFF files are empty. These tags were being included as genome metadata, but their zero length caused problems when writing to the pangenome file.
To fix this, I’ve implemented a filter to ignore any tags with empty values. Instead a debug log will now be generated to provide info into what’s being filtered out. I also improved the error message when dealing with wrong metadata.
This should resolve issue #285.
Example of an Empty Tag Encountered
One example of an empty tag is genome_md5, which looks like this:
LOCUS ctg.s2.38.arrow 14924 bp DNA linear UNK
DEFINITION Fusobacterium sp. nucleatum KCOM1001
ACCESSION ctg.s2.38.arrow
KEYWORDS .
SOURCE Fusobacterium sp. nucleatum KCOM1001.
ORGANISM Fusobacterium sp. nucleatum KCOM1001
Bacteria.
FEATURES Location/Qualifiers
source 1..14924
/mol_type="genomic DNA"
/db_xref="taxon:6666666"
/genome_md5=""
/project="mzepedar_6666666"
/genome_id="6666666.722438"
/organism="Fusobacterium sp. nucleatum KCOM1001"
This PR addresses an issue where genome tags in GBFF files are empty. These tags were being included as genome metadata, but their zero length caused problems when writing to the pangenome file.
To fix this, I’ve implemented a filter to ignore any tags with empty values. Instead a debug log will now be generated to provide info into what’s being filtered out. I also improved the error message when dealing with wrong metadata.
This should resolve issue #285.
Example of an Empty Tag Encountered
One example of an empty tag is
genome_md5
, which looks like this: