labgem / PPanGGOLiN

Build a partitioned pangenome graph from microbial genomes
https://ppanggolin.readthedocs.io
Other
230 stars 25 forks source link

Exception: Reading the gbff file. Expected type is string, given type was '<class 'NoneType'>' #203

Closed frdel1 closed 1 week ago

frdel1 commented 3 months ago

Hi, I am experiencing the following error with ppanggolin projection: Exception: Reading the gbff file '/path/to/some/genomic.gbk' raised an error. Expected type is string, given type was '<class 'NoneType'>'

Steps to reproduce:

# get the query genome
curl -s  "https://eutils.ncbi.nlm.nih.gov/entrez/eutils/efetch.fcgi?db=nucleotide&id=NC_004368&rettype=gbwithparts&retmode=txt" > NC_004368.gbk
# run ppanggolin projection against the pangenome
ppanggolin projection --pangenome ~/path/to/some/pangenome.h5 --cpu 1 --anno ~/path/to/NC_004368.gbk --table --genome_name NC_004368 --output ppanggolin_projection

Workaround: This error is caused by the extra empty line at the end of the file (line 86756 for the example file NC_004368.gbk above). Delete the extra empty line at the end of the file and ppanggolin projection should run just fine. When downloading the same genome using the NCBI RefSeq assembly instead of the genome accession number, the downloaded gbff file does not have the extra empty line at the end of the file and ppanggolin projection runs fine.

datasets download genome accession GCF_000196055.1 --include gbff
unzip ncbi_dataset.zip
JeanMainguy commented 3 months ago

Hi,

Thanks for reporting this bug and the workaround. It does not seem complicated to fix.

After some testing, the bug also occurs in the annotate command (and by extension all workflow commands) when using this type of gbff.

JeanMainguy commented 1 week ago

The fix for this issue has been released in v2.1.0.