wurmlab / flo

Same species annotation lift over pipeline.
95 stars 28 forks source link

gt gff3 error #25

Closed bhavnah closed 3 years ago

bhavnah commented 5 years ago

Hi,

I ran flo, and got this error:

warning: line 1 in file "-" does not begin with "##gff-version" or "##gvf-version", create "##gff-version 3" line automatically gt gff3: error: Parent "maker-Contig53-exonerate_est2genome-gene-0.0" on line 1 in file "-" was not defined (via "ID=") rake aborted! Command failed with status (1): [/data/apps/flo/gff_recover.rb run/Medicago...] /data/apps/flo/Rakefile:60:in block (2 levels) in <top (required)>' /data/apps/flo/Rakefile:40:ineach' /data/apps/flo/Rakefile:40:in `block in <top (required)>' Tasks: TOP => default (See full trace by running task with --trace)

However, I can see the following output: lifted.gff3 and an unlifted.gff3 (both are non-empty). There is also an empty lifted_cleaned.gff. Can you please tell me what's going on?

Happy to send the .gff3 files if needed.

Thanks!

yeban commented 5 years ago

lifted.gff3 is the output of UCSC liftOver tool. Some of the annotations in this file may be incorrect. flo tries to eliminate them and output a clean set to lifted_cleaned.gff3. This last step is failing. In this last step, flo can only process 2-level relationships (transcripts, and its child exon and cds). We thus recommend removing gene annotations before running flo (it's documented in the README file). Alternatively, you can try to remove incorrect annotations yourself from lifted.gff3.

olechnwin commented 5 years ago

Hi, I removed the gene annotations using the "gff_remove_feats.rb gene" but still got the same error. Does gff_remove_feats.rb suppose to remove all lines with "gene" in the third column? So, the 64th line should be removed?

    62  NC_000001.10    BestRefSeq      exon    139790  139847  .       -       .       ID=id30;Parent=rna9;Dbxref=GeneID:729737,Genbank:NR_039983.2;gbkey=ncRNA;gene=LOC
    63  NC_000001.10    BestRefSeq      exon    134773  139696  .       -       .       ID=id31;Parent=rna9;Dbxref=GeneID:729737,Genbank:NR_039983.2;gbkey=ncRNA;gene=LOC
    64  NC_000001.10    Curated Genomic gene    157784  157887  .       -       .       ID=gene11;Dbxref=GeneID:106480049,HGNC:HGNC:48063;Name=RNU6-1100P;description=RNA

Thanks in advance for your help.

yeban commented 5 years ago

Yes, that's the expected behavior, i.e., line 64 should be absent from the output.

olechnwin commented 5 years ago

Yes, that's the expected behavior, i.e., line 64 should be absent from the output.

Thank you for your reply. I'll remove all lines with "gene" in the third column. Forgot to mention, the above example of lines 62-64 are my gff after running "gff_remove_feats.rb gene"

yeban commented 5 years ago

Can you share a subset of your GFF so I can try to reproduce the issue?

olechnwin commented 5 years ago

Can you share a subset of your GFF so I can try to reproduce the issue?

Sorry for my late reply. I am using this gff file: link

I used awk to remove space in the second column and remove gene annotation. This time, although liftOver does not have any error, it generated lifted.gff3 that only has a bunch of comments. unlifted.gff3 basically have everything. I am attaching the gff file that I modified from the link above and the lifted.gff output from flo. GRCh37_latest_genomic_tx.gff.gz lifted.gff3.gz

Thank you very much for your help.

olechnwin commented 5 years ago

Sorry. Never mind. I just realized the GFF file that I was using does not have the correct seqid. I'll used a different GFF file.