Open weshorton opened 8 years ago
align --loci TRB --species mmu --report align_report.txt input.fastq output.vdjca
assemble --report assemble_report.txt input.vdjca output.clns
Overall To Do: summarize notable QC outputs, possibly change parameters and compare change in QC outputs.
Example of Adaptive's primer specificity analysis. Potentially useful experiment for determining why most unaligned reads are due to lack of J region. See Align section of markdown
5/31/2016 email response from MiXCR development team:
We have checked your data files and found that there are two main reasons why there are about 50% of dropped reads:
- it looks like that there is about 50% contamination by sequences of Cyprinus carpio (we have BLASTed few reads that were not aligned and found that almost all of them alignes to Cyprinus carpio )
- the mmu library does not look perfectly enriched by CDR3 containing regions; there is some contamination by other genomic sequences.
In general, it seems that there are some problems with library preparation protocol that should be addressed on the wet lab side.
Assembled clones look very odd but seems to be aligned correctly (too long VJ insertions and too many out of frame clones): I only saw something similar in the analysis of thymus derived samples.
Additionally, we recommend to add the following option on the
align
step in order to increase selectivity of alignments for such contaminated case:mixcr align -OvParameters.parameters.floatingLeftBound=false Š
Note: All reads in this data set failed the default alignment run
Summary: All of these examples appear to be off-target amplification. The V and J alignments are only 18-25 base pairs long, suggesting that only the primers are aligning.
Summary: More of the same. Only the length of the primer is matching, and nothing else.
Summary: More off-target amplification. Only the primers are aligning and nothing else.
Looks like quite a bit of off-target amplification. A few J primers may have forward priming ability as well. Hopefully the new PCR protocol will take care of a lot of this. These results also raise more questions about what MiXCR is doing. Why is it saying that there are D hits, when that sequence actually aligns to a completely unrelated gene?
Another thing to note is that this is a relatively small sample of our data. I'm going to look into ways to use the tab-separated outputs to try and quantify how many alignments to V and J are actually just aligning to the primer.
Link to brief summary, and links to papers, of a few alternative TCR analysis programs
See alignment length report for analysis of V, J, and total alignment lengths in equivolume 151124 batch.
Based on report, we should not implement a size selection during library preparation. Report also suggests that MiXCR is doing its job in the sense that it is successfully assembling all of the alignments that are true CDR3 sequences.
Summary
MiXCR performs an alignment as well as an assembly step during its process of identifying clonotypes. During assembly, a clustering method is utilized to attempt to overcome PCR and sequencing errors and build accurate counts of clonotypes. Is this method appropriate and how does it relate to depth of coverage for unique sequences? Refer to markdown for more detailed explanation of alignment and assembly steps.
Significance
We need an accurate proxy for depth of coverage in order to determine how T-cell concentration influences our results (#10)
To Do
Approach