Open Juke34 opened 4 years ago
I can provide you a fixed version of this file if you wish
We also found an awkward transcript at the end of the file that has a CDS of 3 nucleotides long.
I ensembl transcript 224563 224862 . - . ID=YAR070C;Parent=YAR070C;geneID=YAR070C;gene_biotype=protein_coding;gene_name=YAR070C;gene_source=ensembl;gene_version=1;p_id=P48;transcript_biotype=protein_coding;transcript_source=ensembl;transcript_version=1;tss_id=TSS435
I ensembl exon 224563 224862 . - . Parent=YAR070C;exon_id=YAR070C.1;exon_number=1;exon_version=1;gene_biotype=protein_coding;gene_name=YAR070C;gene_source=ensembl;gene_version=1;p_id=P48;transcript_biotype=protein_coding;transcript_source=ensembl;transcript_version=1;tss_id=TSS435
I ensembl CDS 224563 224565 . - 0 Parent=YAR070C;exon_number=1;gene_biotype=protein_coding;gene_name=YAR070C;gene_source=ensembl;gene_version=1;p_id=P48;transcript_biotype=protein_coding;transcript_source=ensembl;transcript_version=1;tss_id=TSS435
Is it something normal? It sounds wrong. Maybe it should be removed.
We found a problem in the gff file you have as test.
ID and Parent attributes of transcript features have same IDs. This is not allowed in GFF3 specifications.
We use AGAT that deals with that problem by automatically updating the parent ID to be uniq. Using this file to test/build pipelines might be problematic. This should be updated.