Closed pgonzale60 closed 3 years ago
Thanks for this report. As you said, I think stripping spaces from the entire line will certainly break some inputs. Why not just strip the columns in question after the split command instead?
I agree. In addition to just modifying the coordinate columns I'm also modifying the feature colum. In particular, I'm removing the exon feature as specified here. I will open a new pull request after testing it with another genome.
OK, but the exon should only be removed for non-coding RNA features as that link shows. For regular mRNAs there certainly should still be an exon.
And the link you gave was from NCBI, who only very recently started supporting GFF3. It doesn't fully match the GFF3 spec but because it's NCBI I suspect we'll all have to switch to how they're encoding things vs. how they were actually specified to be. Some scripts here in biocode won't work with some aspects of NCBI's GFF3 annotations quite yet.
Bedtools v2.29.2 did not recognize this gff3. Using
cat -t
showed that there are some white spaces just after a coordinate and the tab. Adding this line allowed resulted in a gff3 successfully processed by bedtools. However, I'm not sure if removing all white spaces could have secondary effects (e.g. for contig names that do contain white spaces).