Closed laurabiggins closed 2 years ago
I've written a separate post-processing script that takes a gtf output file from nexons and creates a new gtf file with each exon on a separate line. This should be compatible with IGV. Information from the first and last splice sites currently get thrown away.
I checked the extracted gtf outputs.
[ ] It requires quotes around the names of gene id, transcript id and exon id separated by semi colon (;). The order should be like this pasted in one column: gene_id "SIRV1"; transcript_id "SIRV101"; exon_number "0";
[ ] Splice_pattern column can be discarded for this igv gtf file since exon positions and numbers are specified at V4, V5 and V9, respectively.
I could modify it on R and add an exon number at each row. Normally, the exon number starts from 0. Since our first exon is missing, I started it from 1.
SIRV1 nexons transcript 1494 10414 7 - 0 gene_id "SIRV1"; transcript_id "Variant24";
SIRV1 nexons exon 6328 6481 7 - 0 gene_id "SIRV1"; transcript_id "Variant24"; exon_number "1";
Since the first and last splice sites are thrown away, it does not collate correctly in IGV. That's why, it is difficult to use this to calculate the precision and sensitivity of exon detection using gffcompare.
Nexons has an option to create gtf output files, but these do not separate out the exons on to individual rows, so we can't see the individual exons when visualising in IGV.