Open MichaelFokinNZ opened 6 years ago
UPD. Of course :) nothing wrong with process_exonerate_gff3.pl - it seems to be all about how Geneious loads the (-) strand features...
Okay. Glad it worked. Since this is just a deposit for my working scripts I don’t have unit tests for these but originally I did have test cases this was validated against.
Jason Stajich, PhD jasonstajich.phd@gmail.com On Aug 7, 2018, 5:03 PM -0700, MichaelFokinNZ notifications@github.com, wrote:
UPD. Of course :) nothing wrong with process_exonerate_gff3.pl - it seems to be all about how Geneious loads the (-) strand features... — You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub, or mute the thread.
sorry, still not quite sure with gff (-) strand coordinates in the process_exonerate_gff3.pl output. could you please check in the file attached above, if it is a normal transformation of exonerate output by the script. the genes there are from the reference well-know cluster, so I can check that they were annotated correctly on gff being taken straight from exonerate, but not after the process_exonerate_gff3.pl concatenation/filtering.... :(
I’ll try. Not much time this week to do this.
Jason Stajich, PhD jasonstajich.phd@gmail.com On Aug 7, 2018, 5:27 PM -0700, MichaelFokinNZ notifications@github.com, wrote:
sorry, still not quite sure with gff (-) strand coordinates. could you please check in the file attached above if it is a normal transformation of exonerate output by the script. — You are receiving this because you commented. Reply to this email directly, view it on GitHub, or mute the thread.
thank you, no rush. just really odd to have any issues with the script used millions of times... I will attach the exonerate files used as a test input. it is all public. exonerate_chunk8_output.txt exonerate_chunk9_output.txt
sorry, any chance you had time to check it?
No time. But remind me again what you need exactly - these are just Archive of things I use not intended to be production resources for anyone.
Jason Stajich, PhD jasonstajich.phd@gmail.com On Aug 19, 2018, 6:54 PM -0700, MichaelFokinNZ notifications@github.com, wrote:
sorry, any chance you had time to check it? — You are receiving this because you commented. Reply to this email directly, view it on GitHub, or mute the thread.
got it :) there two parts: (1) problem with the script - it technically reverses coordinates of features from the (-) strand - so misplacing them in a final gff. any guess why that can happen? (2) what are you using nowadays for protein2dna mapping, if not considering exonerate?
Sorry I really dunno at this point - I wrote this 10 years ago or longer.
There are several versions of the parsers in that folder. I don't remember what the distinctions were for.
I do use exonerate to look at p2g or t2g alignments visually and I annotate with funannotate - I rely on the ryo I think for other aspects.
I write a quick and dirty targetted gene predictor here which uses it - it seems like it reads the location results natively from the ryo rather than parsing to gff3 but I don't quite recall. https://github.com/hyphaltip/genome-scripts/blob/5a986246c55467440e638740dddbb6bc8ec632aa/gene_prediction/tfastx2gene.pl
There was a point in the early iterations of exonerate where rev strand feature were unrolled in a weird way so that a DNA contig was basically represented like this in coordinates - I don't remember why. Maybe something I did to parse that was part of that schema.
--FWD--> <---REV--
I really have little recollection. At this point there is GFF spit out from it so I am not sure what's the main use case.
I am sure ensembl has an exonerate parser as does funannotate so you can look at that code too or take from there.
thank you! a lot of hints :) sure will be able to get around (already wrote a basic parser...)
what helped - I've commented out the condition for (-) strand
if( $f->strand < 0 ) { my $s = $length - $f->end + 1; my $e = $length - $f->start +1; $f->start($s); $f->end ($e); }
PS. Other parsers, such as one from EVM (used in funannotate) don't produce "fully functional" gff.
ha, another curious feature, was not easy to catch: for some reason for last protein in every chunk vulgar taken to gff is not it's own, but from the previous protein!
Hi Jason, maybe you can advise using something better than exonerate for protein2genome mapping, but... since I am using it - I've found a very strange behavior of your perl script, I can't explain.
pretty standard/recommended exonerate parameters
exonerate --model protein2genome --percent 95 --showtargetgff yes --showalignment no --targetchunkid ${SLURM_ARRAY_TASK_ID} --targetchunktotal 16 --ryo ">%qi length=%ql alnlen=%qal\n>%ti length=%tl alnlen=%tal\n" --querytype protein --query proteins.fasta --targettype dna --target genomic.fna > exonerate_chunk${SLURM_ARRAY_TASK_ID}.output
the GFF part of output looks correct (checked that), but after using process_exonerate_gff3.pl
perl process_exonerate_gff3.pl -t Protein exonerate_chunk8.output exonerate_chunk9.output > exonerate_test89.gff3
the reverse strand genes became reversed (as in example attached) and in Geneious they come to completely wrong positions. It is default/normal behaviour of the script?