Closed B-Li closed 6 years ago
Sequencing platform may, of course, affect analysis performance. At least Ion Torrent data is single-end, and has different length distribution compared to Illumina HiSeq/MiSeq, not to mention different error profile (skewed toward indels), which should not be a problem, still RNASeq analysis is very sensitive to any data parameters, as it balances on the edge of specificity / selectivity tradeoffs.
Could you please post your alignment report (from mixcr align -p rna-seq ...
) and read length and quality score distributions to get some feeling on how the data looks like. Are there any adapter sequences in the data?
Thank you for your reply!
My code of mixcr is mixcr align -p rna-seq -s hsa -OallowPartialAlignments=true
. The output was
Alignment: 0%
Alignment: 10.1% ETA: 00:18:01
Alignment: 20.1% ETA: 00:15:00
Alignment: 30.2% ETA: 00:12:29
Alignment: 40.2% ETA: 00:10:43
Alignment: 50.3% ETA: 00:09:38
Alignment: 60% ETA: 00:08:12
Alignment: 69.7% ETA: 00:06:16
Alignment: 79.7% ETA: 00:03:57
Alignment: 89.7% ETA: 00:01:53
Alignment: 99.8% ETA: 00:00:02
============= Report ==============
Analysis time: 19.11m
Total sequencing reads: 14340091
Successfully aligned reads: 39 (0%)
Alignment failed, no hits (not TCR/IG?): 14242355 (99.32%)
Alignment failed because of absence of CDR3 parts: 5254 (0.04%)
Alignment failed because of low total score: 92443 (0.64%)
Overlapped: 0 (0%)
Overlapped and aligned: 0 (0%)
Alignment-aided overlaps: 0 (�%)
Overlapped and not aligned: 0 (0%)
sequence quality looks like:
read length looks like:
I downloaded this toy data from SRA, and fastq-dump it. I don't know how to check whether adapter sequence was included in the fastaq files.
Thank you very much for your help! :-)
Well, there can simply be too few TCR/IG mRNA (cells) in the sample. Where does the sample come from? How much T/B-cell-related gene expression do you see (CD3/19, TR/IG loci) when running, say, Kallisto software? You can actually check our latest Nat Biotech paper to see the recall when running on RNA-seq vs RepSeq data for B-cells.
Well it can be the reason. However, the gene-based counts in log2 scale for CD3G and CD3D are 8.73 and 6.49, respectively. Similarly, expression levels for CD8A and B are 8.06 and 6.31. It seems to me that there should be certain amount of TCR present in the sequencing data. And the sample is derived from early stage lung adenocarcinoma which is thought as immunogenic.
Can you please also post assemblePartial
and assemble
reports. align
report shows 39 successful alignments, so either neither of them contain CDR3 or can be assembled into CDR3-containing contigs or other filters (like quality) dropped some results.
The assemblePartial report looks like this: Building index: 0% Searching for overlaps: 0% ============= Report ============== Analysis time: 407.00ms Total alignments analysed: 39 Number of output alignments: 39 (100%) Alignments already with CDR3 (no overlapping is performed): 21 (53.85%) Successfully overlapped alignments: 0 (0%) Left parts with too small N-region (failed to extract k-mer): 0 (0%) Extracted k-mer diversity: 0 Dropped due to wildcard in k-mer: 0 (0%) Dropped due to too short NRegion parts in overlap: 0 (0%) Dropped overlaps with empty N region due to no complete NDN coverage: 0 (0%) Number of left-side alignments: 0 (0%) Number of right-side alignments: 18 (46.15%) Complex overlaps: 0 (0%) Over-overlaps: 0 (0%) Partial alignments written to output: 18 (46.15%)
Another round of partial assemble gives the same report. The assemble report is listed below:
Initialization: progress unknown Writing clones: 0% ============= Report ============== Analysis time: 529.00ms Final clonotype count: 3 Average number of reads per clonotype: 3.33 Reads used in clonotypes, percent of total: 10 (0%) Reads used in clonotypes before clustering, percent of total: 10 (0%) Number of reads used as a core, percent of used: 10 (100%) Mapped low quality reads, percent of used: 0 (0%) Reads clustered in PCR error correction, percent of used: 0 (0%) Reads pre-clustered due to the similar VJC-lists, percent of used: 0 (0%) Reads dropped due to the lack of a clone sequence: 18 (0%) Reads dropped due to low quality: 0 (0%) Reads dropped due to failed mapping: 11 (0%) Reads dropped with low quality clones: 0 (0%) Clonotypes eliminated by PCR error correction: 0 Clonotypes dropped as low quality: 0 Clonotypes pre-clustered due to the similar VJC-lists: 0 Thank you again for your help! :-)
Hi, I am working on a RNAseq data generated from Ion Torrent Proton platform. The length of reads varies from 20ish to 200ish. For this data I got 0% TCR/IG alignment. What is the reasonable alignment quality parameters should I use? Thank you for your help!