wurmlab / flo

Same species annotation lift over pipeline.
95 stars 28 forks source link

Trying to map over gff results error #13

Closed cmdcolin closed 7 years ago

cmdcolin commented 7 years ago

I ran flo on a whole genome and found that the lifted.gff had features like this referring to a parent PKINGS_0.1_G055355, but the parent PKINGS_0.1_G055355 was not in the file

Scaffold_87 maker   mRNA    5628861 5664273 .   -   .   ID=PKINGS_0.1_T055355-R4;Parent=PKINGS_0.1_G055355;Name=PKINGS_0.1_T055355-R4;Alias=maker-Scaffold517-augustus-gene-2.2-mRNA-1;Dbxref=InterPro:IPR000157,InterPro:IPR007632,Pfam:PF01582,Pfam:PF04547;Note=Similar to ANO4: Anoctamin-4 (Homo sapi
ens);Ontology_term=GO:0005515,GO:0007165;_AED=0.30;_QI=451%7C0.83%7C0.83%7C1%7C0.96%7C0.93%7C31%7C1288%7C1182;_eAED=0.30

I took Scaffold_87 from target.fa and the scaffold that PKINGS_0.1_G055355 originally came from and ran a smaller flo alignment between these two sequences. Interestingly enough, the new lifted.gff actually did contain that parent PKINGS_0.1_G055355, but it seems like gff_recover.rb run/annotations/lifted.gff3 actually removed the gene line?

Full output

...
chainMergeSort run/*.chn.sorted | chainSplit run stdin -lump=1
mv run/000.chain run/combined.chn.sorted
chainNet run/combined.chn.sorted run/source.sizes run/target.sizes run/combined.chn.sorted.net /dev/null
Got 1 chroms in run/source.sizes, 1 in run/target.sizes
Finishing nets
writing run/combined.chn.sorted.net
writing /dev/null
netChainSubset run/combined.chn.sorted.net run/combined.chn.sorted run/liftover.chn
Processing Scaffold517
mkdir run/annotations
liftOver -gff annotations.gff run/liftover.chn run/annotations/lifted.gff3 run/annotations/unlifted.gff3
Reading liftover chains
Mapping coordinates
WARNING: -gff is not recommended.
Use 'ldHgGene -out=<file.gp>' and then 'liftOver -genePred <file.gp>'
/home/me/flo/gff_recover.rb run/annotations/lifted.gff3 2> unprocessed.gff | gt gff3 -tidy -sort -addids -retainids - > run/annotations/lifted_cleaned.gff
warning: line 1 in file "-" does not begin with "##gff-version" or "##gvf-version", create "##gff-version 3" line automatically
gt gff3: error: Parent "PKINGS_0.1_G055355" on line 1 in file "-" was not defined (via "ID=")
rake aborted!

Here is an example

flo.tar.gz

yeban commented 7 years ago

My bad - I clarified this in the readme only last week. gff_reocver.rb can only reconstruct transcripts. Because it's harder to work with 3-level gene-transcripts-exon/CDS features. You can remove gene annotations prior to lift over using gff_remove_feats.rb script.

cmdcolin commented 7 years ago

I see, thanks. It might be nice if it could reconstruct 3 layers but I definitely understand if that's problematic. I'll close for now :)