This PR addresses an issue in generating transcript files for Kallisto etc using .gtf files and genomic.fasta files, and when spike-ins are included.
Currently, Make logic causes:
GTF file to be made for ERCC controls in .fasta, using spikein_fasta2gtf.pl
ERCC GTF lines to be appended to main GTF
ERCC fasta lines to be appended to main fasta
irap_gtf_to_fasta to be called to produce a cDNA file.
ERCC lines to be concatenated to the result
5 should not be necessary, if ERCC lines were processed correctly at 4. However irap_gtf_to_fasta ignores lines without 'transcript_type' or 'transcript_biotype', which includes all exons created at 1, and therefore only outputs ERCC transcripts at an intermediate stage (eliminating exons). The tophat2_gtf_to_fasta called internally then ignores all these transcripts (working as it does exclusively with exons).
This PR simplifies the logic in the following ways:
Fix spikein_fasta2gtf.pl so that exons are populated with 'transcript_type' and 'transcript_biotype', so that the resulting GTF lines are compatible with irap_gtf_to_fasta.
Re-wire the logic of irap_core.mk to account for the fact that 5. is no longer necessary (the ERCC lines of the composite GTF having been processed correctly).
This PR addresses an issue in generating transcript files for Kallisto etc using .gtf files and genomic.fasta files, and when spike-ins are included.
Currently, Make logic causes:
5 should not be necessary, if ERCC lines were processed correctly at 4. However irap_gtf_to_fasta ignores lines without 'transcript_type' or 'transcript_biotype', which includes all exons created at 1, and therefore only outputs ERCC transcripts at an intermediate stage (eliminating exons). The tophat2_gtf_to_fasta called internally then ignores all these transcripts (working as it does exclusively with exons).
This PR simplifies the logic in the following ways:
I've tested the fix- it works.