Open weshorton opened 8 years ago
Summary of V-regions in 151124_extra_segs.csv, found here.
Counts aggregated from "Best V Hit" column of .csv file.
Adding some documents from Dhaarini: IMGT_nomenclature.pdf MalekFahamPatent.pdf TCRB_pseudogenes.docx
Extracted from the Faham patent:
Based on suggestion by DM, we took an in-depth look at V22 hits. Subsequent results are from the equivolume DNA151124LC batch.I used this script to extract the read IDs of all the sequences that assembled to V22 and subsequently their fastq read information (i.e. ID, sequence, q score).
The fastq reads were then re-aligned using MiXCR align and those alignments were exported using the pretty format.
These observations suggest to me that the issue is a sequence-similarity issue and MiXCR is incorrectly identifying the reads, see bottom for more information..
Sequences taken from GenBank. I selected the V region for each gene and then compared the main body of the sequence (i.e. excluded the first fragment). I ran a sequence similarity alignment using BLAST, results are here. There doesn't seem to be a huge difference between V22-V24 and V22-V26 alignments and the V22-V1 alignment.
To Do
Summary
There are a total of 31 V regions in the TCR Beta receptor locus. 11 of them are pseudogenes and 20 are transcribed genes (still need to confirm this). We only use primers for the 20 transcribed genes during our amplification. In our MiXCR clonotype output tables, there are a small subset of clonotypes that map to these pseudogenes.
Significance
We need to determine the source of this output in order to determine if we can confidently ignore them, and if not, what to do about them. There are two main theories as to how these pseudogenes are arising in our output files:
To Do
Approach