gpertea / gffread

GFF/GTF utility providing format conversions, region filtering, FASTA sequence extraction and more
MIT License
377 stars 39 forks source link

Segmentation fault or corrupted output GFF when using `-C` coding only option on multiple NCBI genome assembly GFF files #108

Open hermidalc opened 2 years ago

hermidalc commented 2 years ago

https://ftp.ncbi.nlm.nih.gov/genomes/all/GCA/000/843/825/GCA_000843825.1_ViralProj14424/GCA_000843825.1_ViralProj14424_genomic.gff.gz

This either segfaults or produces a corrupted output GFF file. I've run into other examples with NCBI genome assembly GFF files.

hermidalc commented 2 years ago

Might have to do with the fact this genome has five_prime_UTR and three_prime_UTR features, but gffread should be able to handle that and produce exon, CDS, and mRNA output features where the exon and mRNA ranges include the UTR regions and the CDS ranges do not.