Closed agolicz closed 5 months ago
Hi,
TPMCalculator creates a gene model by overlapping the exons of all isoforms of a gene. The -c option set the minimum size for creating an intron when overlapping multiple exons. This value does not affect the quantification of the RNASEq abundance of the exons but it can change the quantification of transcripts and genes if there are intron retention present.
Multi-mapping reads are filtered using the MAPQ value as you said. However, each aligner has its proper implementation of MAPQ values so you need to check for your aligner.
This blog could be of more help regarding the MAPQ values for aligner: https://sequencing.qcfail.com/articles/mapq-values-are-really-useful-but-their-implementation-is-a-mess/
Sorry, one more question, does TPMCalculator support stranded libraries and if not is there a plan to add that feature? I thought it might have been included in v0.04. I just did fresh installations with Miniconda3 and according to the installation details following packages were installed: bamtools-2.5.1 | he513fc3_6 1.1 MB bioconda tpmcalculator-0.0.4 | h7376a40_0 1.4 MB bioconda But when I run TPMCalculator -version It still lists 0.0.3 and the options don't mention stranded reads. Usage: TPMCalculator
TPMCalculator options:
-v Print info -version Print version -h Display this usage information. -g GTF file -d Directory with the BAM files -b BAM file -k Gene key to use from GTF file. Default: gene_id -t Transcript key to use from GTF file. Default: transcript_id -c Smaller size allowed for an intron created for genes. Default: 16. We recommend to use the reads length -p Use only properly paired reads. Default: No. Recommended for paired-end reads. -q Minimum MAPQ value to filter out reads. Default: 0. This value depends on the aligner MAPQ value. -o Minimum overlap between a reads and a feature. Default: 8. -e Extended output. This will include transcript level TPM values. Default: No. -a Print out all features with read counts equal to zero. Default: No.
That feature will be include in the next release.
Hello, I am not sure I understand the meaning of the -c flag "-c Smaller size allowed for an intron created for genes. Default: 16. We recommend to use the reads length". Why do you recommend the reads length?
Also, how does the software treat multi-mapping reads, so reads matching multiple locations across the genome (value more than 1 in NH:i: field). Is that normally handled by MAPQ filtering?
All the best, Agnieszka