wurmlab / flo

Same species annotation lift over pipeline.
96 stars 28 forks source link

Problem with temporary GFF file #12

Closed blackFirefly closed 7 years ago

blackFirefly commented 7 years ago

I tried flo yesterday, but it ended up in an error. It seems like there is a problem in a temorary GFF file? So the question is if the program or my input GFF is the problem?

It created a file called "lifted.gff3" and one called "unlifted.gff3". Both of them are filled. But there is also a third file "Aarabicum.v2.5.gff-liftover-aethionema-arabicum_v3.0.fasta.gff3" which is empty.

Here are the last lines flo printed:

Processing Scaffold_3140 mkdir Aarabicum.v2.5.gff-liftover-aethionema-arabicum_v3.0.fasta liftOver -gff /home/muehlich/Desktop/aethionema/data/Aarabicum.v2.5.gff run/liftover.chn Aarabicum.v2.5.gff-liftover-aethionema-arabicum_v3.0.fasta/lifted.gff3 Aarabicum.v2.5.gff-liftover-aethionema-arabicum_v3.0.fasta/unlifted.gff3 Reading liftover chains Mapping coordinates WARNING: -gff is not recommended. Use 'ldHgGene -out=' and then 'liftOver -genePred ' gt gff3 -tidy -sort -addids -retainids /tmp/lifted20170614-22821-oyvvge > Aarabicum.v2.5.gff-liftover-aethionema-arabicum_v3.0.fasta/Aarabicum.v2.5.gff-liftover-aethionema-arabicum_v3.0.fasta.gff3 warning: line 1 in file "/tmp/lifted20170614-22821-oyvvge" does not begin with "##gff-version" or "##gvf-version", create "##gff-version 3" line automatically gt gff3: error: Parent "AA1G00001" on line 2 in file "/tmp/lifted20170614-22821-oyvvge" was not defined (via "ID=") rake aborted! Command failed with status (1): [gt gff3 -tidy -sort -addids -retainids /tm...] /home/muehlich/flo/Rakefile:113:in process_gff' /home/muehlich/flo/Rakefile:234:inblock (2 levels) in <top (required)>' /home/muehlich/flo/Rakefile:223:in each' /home/muehlich/flo/Rakefile:223:inblock in <top (required)>' Tasks: TOP => default (See full trace by running task with --trace)

yeban commented 7 years ago

I can take a look if you can send me "lifted.gff3".

blackFirefly commented 7 years ago

That would be great! Since the file has a size of around 30MB, I sent you a dropbox link to the email adress stated in your profile.

cmdcolin commented 7 years ago

I am seeing the same problem as well... need anymore test data?

cmdcolin commented 7 years ago

I think the specific issue with these parents not being defined happens due to them being in the unlifted file

For example I had child features with Parent=SP_0.1_T008586-R3 in lifted.gff3 but then unlifted.gff3 had the actual parent where ID=PKINGS_0.1_T008586-R3

yeban commented 7 years ago

That is expected as liftOver reads gff line by line and not the transcript as a whole. flo's process_gff method tries to fix such inconsistencies in liftOver's output. So the final output from flo should not be an invalid gff.

On 27 Jun 2017, at 22:20, Colin Diesh notifications@github.com wrote:

I think the specific issue with these parents not being defined happens due to them being in the unlifted file

For example I had child features with Parent=SP_0.1_T008586-R3 in lifted.gff3 but then unlifted.gff3 had the actual parent where ID=PKINGS_0.1_T008586-R3

— You are receiving this because you commented. Reply to this email directly, view it on GitHub, or mute the thread.

cmdcolin commented 7 years ago

Ah...I think I remember at one point writing a script to synthesize a parent features for features without parents for something like this...is that what process_gff does?

yeban commented 7 years ago

That, and eliminating transcripts that mapped partly to different scaffolds.

On 28-Jun-2017, at 12:09 AM, Colin Diesh notifications@github.com wrote:

Ah...I think I remember at one point writing a script to synthesize a parent features for features without parents for something like this...is that what process_gff does?

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/wurmlab/flo/issues/12#issuecomment-311511139, or mute the thread https://github.com/notifications/unsubscribe-auth/AAFhBewAkK8sd0zhbCArJP81gDicmpKOks5sIYuogaJpZM4N5hgT.

cmdcolin commented 7 years ago

Gotcha...I was considering maybe using crossmap, but it looks like it has the same issue

Maybe need to convert from gff to something else, bed12 or similar

yeban commented 7 years ago

@cmdcolin:

I am seeing the same problem as well... need anymore test data?

There was a bug. I have made some changes. Can you give it a spin?

@blackFirefly - please see my email

cmdcolin commented 7 years ago

@yeban I believe it is working better now, it now gets to the genometools stage, but the genometools ends up crashing

Could maybe ask their team about it, error message isn't easy to interpret

$ rake
mkdir annotations.gff-liftover-target
liftOver -gff annotations.gff run/liftover.chn annotations.gff-liftover-target/lifted.gff3 annotations.gff-liftover-target/unlifted.gff3
Reading liftover chains
Mapping coordinates
WARNING: -gff is not recommended.
Use 'ldHgGene -out=<file.gp>' and then 'liftOver -genePred <file.gp>'
/home/me/flo/gff_recover.rb annotations.gff-liftover-target/lifted.gff3 | gt gff3 -tidy -sort -addids -retainids - > annotations.gff-liftover-target/annotations.gff-liftover-target.gff3
warning: line 1 in file "-" does not begin with "##gff-version" or "##gvf-version", create "##gff-version 3" line automatically
Assertion failed: (elemidx >= q->front), function gt_queue_remove, file src/core/queue.c, line 135.
This is a bug, please report it at
https://github.com/genometools/genometools/issues
Please make sure you are running the latest release which can be found at
http://genometools.org/pub/
You can check your version number with `gt -version`.
Aborted (core dumped)
/home/me/flo/gff_recover.rb:60:in `write': Broken pipe @ io_write - <STDOUT> (Errno::EPIPE)
        from /home/me/flo/gff_recover.rb:60:in `puts'
        from /home/me/flo/gff_recover.rb:60:in `puts'
        from /home/me/flo/gff_recover.rb:60:in `<main>'
rake aborted!
cmdcolin commented 7 years ago

At least one thing that could be suspicious is that there are still lines that exist without parents. If I save the file from

gff_recover.rb annotations.gff-liftover-target/lifted.gff3 > out.gff then out.gff (first feature in file) has an mRNA that references a parent gene that is not in out.gff

yeban commented 7 years ago

@blackFirefly's problem was partly flo and partly the gff. The former is now fixed.

@cmdcolin I can't be sure what the problem is without looking at the input / lifted gff. Please could you open a new issue with test data?