nawrockie / jiffy-infernal-hmmer-scripts

Simple perl scripts for dealing with infernal and/or hmmer output/input.
5 stars 2 forks source link

Gff output from infernal-tblout2gff.pl #1

Open sanyalab opened 4 years ago

sanyalab commented 4 years ago

Hi Eric,

I have a cmscan tblout file that I am converting using the infernal-tblout2gff.pl script. I was wondering if this is a direct rearrangement of the columns of tblout to get the gff3 file. Will it be possible to get a gff3 output that adheres to Sequence Ontology, when the level 1 and 2 features are described?

Thanks Abhijit

nawrockie commented 4 years ago

I'm not sure what you mean by level 1 and 2 features. If you provide an example of a cmscan tblout file and the corresponding GFF file in the format you want with the info you want, I can provide a better answer. Thanks.

sanyalab commented 4 years ago

Hi Eric,

Thank you for writing back. Here is a cmscan output processed with the cmscan to gff script

Chr04 GSAP LSU_rRNA_eukarya 1049382 1052765 3229.7 + . evalue=0;idx=1;seqaccn=-;mdlaccn=RF02543;clan=CL00112;mdl=cm;mdlfrom=1;mdlto=3401;trunc=no;pass=1;gc=0.57;bias=57.2;inc=!;olp=^;anyidx=-;anyfrct1=-;anyfrct2=-;winidx=-;winfrct1=-;winfrct2=-;desc=Eukaryotic_large_subunit_ribosomal_RNA;RFAM; Chr04 GSAP LSU_rRNA_eukarya 1041221 1044604 3227.9 + . evalue=0;idx=2;seqaccn=-;mdlaccn=RF02543;clan=CL00112;mdl=cm;mdlfrom=1;mdlto=3401;trunc=no;pass=1;gc=0.57;bias=57.6;inc=!;olp=^;anyidx=-;anyfrct1=-;anyfrct2=-;winidx=-;winfrct1=-;winfrct2=-;desc=Eukaryotic_large_subunit_ribosomal_RNA;RFAM; Chr04 GSAP LSU_rRNA_eukarya 1057543 1060926 3227.9 + . evalue=0;idx=3;seqaccn=-;mdlaccn=RF02543;clan=CL00112;mdl=cm;mdlfrom=1;mdlto=3401;trunc=no;pass=1;gc=0.57;bias=57.6;inc=!;olp=^;anyidx=-;anyfrct1=-;anyfrct2=-;winidx=-;winfrct1=-;winfrct2=-;desc=Eukaryotic_large_subunit_ribosomal_RNA;RFAM; Chr04 GSAP LSU_rRNA_eukarya 1082027 1085410 3227.9 + . evalue=0;idx=4;seqaccn=-;mdlaccn=RF02543;clan=CL00112;mdl=cm;mdlfrom=1;mdlto=3401;trunc=no;pass=1;gc=0.57;bias=57.6;inc=!;olp=^;anyidx=-;anyfrct1=-;anyfrct2=-;winidx=-;winfrct1=-;winfrct2=-;desc=Eukaryotic_large_subunit_ribosomal_RNA;RFAM; Chr04 GSAP LSU_rRNA_eukarya 1105876 1109259 3227.9 + . evalue=0;idx=5;seqaccn=-;mdlaccn=RF02543;clan=CL00112;mdl=cm;mdlfrom=1;mdlto=3401;trunc=no;pass=1;gc=0.57;bias=57.6;inc=!;olp=^;anyidx=-;anyfrct1=-;anyfrct2=-;winidx=-;winfrct1=-;winfrct2=-;desc=Eukaryotic_large_subunit_ribosomal_RNA;RFAM; LSU_rRNA_eukarya is not a SO term. But it is rRNA. Therefore the entry should read "rRNA_gene" from parent and "rRNA" as child. I was wondering how difficult would it be to code in that manner.

Thanks Abhijit

nawrockie commented 4 years ago

That information (rRNA and rRNA_gene) is not in the cmsearch tblout output, so you'd need to write a script that adds that information to the GFF file after you run infernal-tblout2gff.pl. You'll likely need another input file to your script that maps the RNA families to the SO terms you want to add.