oushujun / EDTA

Extensive de-novo TE Annotator
https://genomebiology.biomedcentral.com/articles/10.1186/s13059-019-1905-y
GNU General Public License v3.0
315 stars 70 forks source link

making the output gff3 valid #432

Closed colindaven closed 4 months ago

colindaven commented 4 months ago

Hi,

thanks for the nice tools. Here's just a note for other users who might need valid gff3 - the genometools gff3 validator was failing on it:

   sed -i 's/Classification/classification/g' tmp.gff3 
    sed -i 's/Sequence_ontology/sequence_ontology/g' tmp.gff3 
    sed -i 's/Identity/identity/g' tmp.gff3 
    sed -i 's/Method/method/g' tmp.gff3 

So a future modification might be to make those attributes lower case.

I was using a genometools singularity container and this command: genometools.sif gt gff3validator

Thanks

GallVp commented 4 months ago

Thank you @colindaven

Have you tried: gt gff3 -retainids -tidy? Sometimes I use this command to fix issues like the one you have mentioned.

colindaven commented 4 months ago

Hi @GallVp thanks for the tip, but I'm not too keen on the output.

It seems to do the job, but introduces "###" after every line or edit, and almost doubles file length

   91933 AT_edta_original.gff3
  182608 AT_tidied_by_gt.gff3
oushujun commented 4 months ago

@colindaven Thank you for the suggestion. It's a simple fix in the source code. Did you find anywhere in the gff3 output that needed to be fixed? For example, ID, Name, Identity, Method, TIR, TSD

colindaven commented 4 months ago

@oushujun Thanks for that.

Just the four listed above - two of which you listed again (Identity, Method).

I'll let you know if anything else crops up when since I'll be testing dozens of genomes in the next few weeks.

Colin