oushujun / EDTA

Extensive de-novo TE Annotator
GNU General Public License v3.0
315 stars 70 forks source link

making the output gff3 valid #432

Closed colindaven closed 4 months ago

colindaven commented 4 months ago


thanks for the nice tools. Here's just a note for other users who might need valid gff3 - the genometools gff3 validator was failing on it:

   sed -i 's/Classification/classification/g' tmp.gff3 
    sed -i 's/Sequence_ontology/sequence_ontology/g' tmp.gff3 
    sed -i 's/Identity/identity/g' tmp.gff3 
    sed -i 's/Method/method/g' tmp.gff3 

So a future modification might be to make those attributes lower case.

I was using a genometools singularity container and this command: genometools.sif gt gff3validator


GallVp commented 4 months ago

Thank you @colindaven

Have you tried: gt gff3 -retainids -tidy? Sometimes I use this command to fix issues like the one you have mentioned.

colindaven commented 4 months ago

Hi @GallVp thanks for the tip, but I'm not too keen on the output.

It seems to do the job, but introduces "###" after every line or edit, and almost doubles file length

   91933 AT_edta_original.gff3
  182608 AT_tidied_by_gt.gff3
oushujun commented 4 months ago

@colindaven Thank you for the suggestion. It's a simple fix in the source code. Did you find anywhere in the gff3 output that needed to be fixed? For example, ID, Name, Identity, Method, TIR, TSD

colindaven commented 4 months ago

@oushujun Thanks for that.

Just the four listed above - two of which you listed again (Identity, Method).

I'll let you know if anything else crops up when since I'll be testing dozens of genomes in the next few weeks.
