wurmlab / flo

Same species annotation lift over pipeline.
96 stars 28 forks source link

gff_recover error #37

Open cmatKhan opened 3 years ago

cmatKhan commented 3 years ago

I'm getting an error that seems to have to do with my input gff. I have tried with both gff_remove_feats.rb and gff_longest_transcripts.rb

parallel --joblog run/joblog.chainSort -j 15 -a run/joblst.chainSort
chainMergeSort run/*.chn.sorted | chainSplit run stdin -lump=1
mv run/000.chain run/combined.chn.sorted
chainNet run/combined.chn.sorted run/source.sizes run/target.sizes run/combined.chn.sorted.net /dev/null
Got 15 chroms in run/source.sizes, 15 in run/target.sizes
Finishing nets
writing run/combined.chn.sorted.net
writing /dev/null
netChainSubset run/combined.chn.sorted.net run/combined.chn.sorted run/liftover.chn
Processing 1
Processing 5
Processing 2
Processing 3
Processing 11
Processing 6
Processing 7
Processing 8
Processing 9
Processing 4
Processing 10
Processing 14
Processing 12
Processing 13
Processing MT
mkdir run/h99_longest_transcript
liftOver -gff /scratch/mblab/chasem/liftOver/flo_crypto/h99/h99_longest_transcript.gff run/liftover.chn run/h99_longest_transcript/lifted.gff3 run/h99_longest_transcript/unlifted.gff3
Reading liftover chains
Mapping coordinates
WARNING: -gff is not recommended.
Use 'ldHgGene -out=<file.gp>' and then 'liftOver -genePred <file.gp>'
/scratch/mblab/chasem/liftOver/flo/gff_recover.rb run/h99_longest_transcript/lifted.gff3 2> run/h99_longest_transcript/lifted_cleanup.log | gt gff3 -tidy -sort -addids -retainids - > run/h99_longest_transcript/lifted_cleaned.gff 2>> run/h99_longest_transcript/lifted_cleanup.log
rake aborted!
Command failed with status (1): [/scratch/mblab/chasem/liftOver/flo/gff_rec...]
/scratch/mblab/chasem/liftOver/flo/Rakefile:60:in `block (2 levels) in <top (required)>'
/scratch/mblab/chasem/liftOver/flo/Rakefile:40:in `each'
/scratch/mblab/chasem/liftOver/flo/Rakefile:40:in `block in <top (required)>'
Tasks: TOP => default

I'm not quite sure where to start debugging. Looking in the Rakefile and at gff_recover didn't give me any good ideas. Any suggestions?

yeban commented 3 years ago

Check the contents of run/h99_longest_transcript/lifted_cleanup.logmaybe

yeban commented 3 years ago

(and let me know too)

cmatKhan commented 3 years ago

I'm reasonably sure at this point that this is a problem with the gff I am using, specifically the feature IDs in the 3rd column. If I can come up with a suggestion for some error handling, I'll go ahead and post it. Otherwise, this is probably a idiosyncratic problem on my side rather than something to handle in the software itself.

Ist4lri commented 1 year ago

Hi; I have the same problem than @cmatKhan just show in his picture. I would like to know how we can fix the problems. This is last lines of the run/Longest_Transcript/lifted_cleanup.log :

Chr1    kh2012  three_prime_UTR 5073729 5073960 .   +   .   Parent=KH2012:KH.C1.8.v5.A.ND1-1;Target=KH.C1.8.v5.A.ND1-1 900 1131
Chr1    kh2012  three_prime_UTR 2551018 2551052 .   +   .   Parent=KH2012:KH.C1.5.v2.A.ND1-1;Target=KH.C1.5.v2.A.ND1-1 4516 4550
Chr1    kh2012  three_prime_UTR 2551316 2551581 .   +   .   Parent=KH2012:KH.C1.5.v2.A.ND1-1;Target=KH.C1.5.v2.A.ND1-1 4551 4807
Chr1    kh2012  three_prime_UTR 5670475 5670962 .   +   .   Parent=KH2012:KH.C1.1056.v1.A.ND1-1;Target=KH.C1.1056.v1.A.ND1-1 1631 2118
warning: line 1 in file "-" does not begin with "##gff-version" or "##gvf-version", create "##gff-version 3" line automatically
gt gff3: error: Parent "KH2012:KH.C1.976" on line 1 in file "-" was not defined (via "ID=")

Any idea or hint for how to correct this ?

I'm using the gff file attached with this message.

I trying to watch RakeFile and gff_recover.rb but didn't see any problem, or hint for solutions... Longest_Transcript.txt