hammerlab / biokepi

Bioinformatics Ketrew Pipelines
Apache License 2.0
27 stars 4 forks source link

Add SplitNCigarReads #151

Open ihodes opened 8 years ago

ihodes commented 8 years ago

Part of GATK

SplitNCigarReads developed specially for RNAseq, which splits reads into exon segments (getting rid of Ns but maintaining grouping information) and hard-clip any sequences overhanging into the intronic regions.

Part of GATK's best practices for RNAseq https://www.broadinstitute.org/gatk/guide/article?id=3891

ihodes commented 8 years ago

Should support ReassignOneMappingQuality as a first-class option. It does the following:

reassign all good alignments to the default value of 60

Why, you ask?

This is not ideal, and we hope that in the future RNAseq mappers will emit meaningful quality scores, but in the meantime this is the best we can do. In practice we do this by adding the ReassignOneMappingQuality read filter to the splitter command.

iskandr commented 8 years ago

Just to clarify: the Broad only advocates it in their "best practices" for variant calling from RNAseq, not for working with RNAseq data in general. Splitting reads on N CIGAR string elements will eliminate all exon-exon junctions, potentially screwing up expression quantification.