Closed jakob-arnold closed 7 months ago
Hey Jaakobb,
Thanks for reaching out - would you mind providing a little more information? What step are you at?
combineTCR()
is taking the different TCR sequences and associating them with a single barcode/cell. The output of combineTCR()
will put together the TRG and TRD chains into a clone (separated by an "_") and should not have these columns: "v_gene", "d_gene" and "j_gene", "chain"
Thanks, Nick
Hi Nick,
For context: I have a single clones.tsv file, which I obtained from the MiXCR pipeline. I ran the following two commands:
contig <- loadContigs(input = "./", format = "MiXCR")
combined <- combineTCR(contig, filterMulti = T, removeNA = T)
And then colnames(combined$S1)
gives me:
[1] "barcode" "chain" "reads" "v_gene" "d_gene" "j_gene" "c_gene" "cdr3_nt" "cdr3" [10] "TCR1" "cdr3_aa1" "cdr3_nt1" "TCR2" "cdr3_aa2" "cdr3_nt2" "CTgene" "CTnt" "CTaa" [19] "CTstrict"
I think it makes sense to have "v_gene" etc. in separate columns after combining TCR sequences, as that may be needed for some downstream applications.
Hey jaakoobb,
You are completely correct - this is unintentional and should have been dropped.
Here is the code I used to make a reproducible example:
MIXCR <- read.csv("https://www.borch.dev/uploads/contigs/MIXCR_contigs.csv")
contigs <- loadContigs(MIXCR, format = "MiXCR")
combined <- combineTCR(contigs)
Downstream of combineTCR()
scRepertoire is only using "barcode", "CTgene", "CTnt", "CTaa", "CTstrict", so the appearance of "chain" "reads" "v_gene" "d_gene" "j_gene" "c_gene" "cdr3_nt" "cdr3" will not affect the analysis. But will work on pushing an update as soon as I can.
Nick
Hi Nick,
Thank you so much for the quick response.
Even though in the "normal" scRepertoire workflow these infos won't be necessary, I feel like some users may still need them for custom applications. Just as an example: If I'm combining the TCR data with the corresponding SeuratObject and I want to highlight in a DimPlot how the distribution of certain chains is across clusters. In my specific case the distribution of delta and gamma chains (TRDV1, TRGV1, ...) is of quite important biological significance. But I'm just suggesting things here, maybe there is a better way to achieve that :)
Thanks Jakob
Hey Jakob,
Apologies for the confusion - the info are stored in "TCR1" "cdr3_aa1" "cdr3_nt1" "TCR2" "cdr3_aa2" "cdr3_nt2", whereas "chain" "reads" "v_gene" "d_gene" "j_gene" "c_gene" "cdr3_nt" "cdr3" is an accidental reminant from the original file.
Nick
Got a tentative fix pushed to the dev branch - will still need to test it before it goes live. Thanks again for finding this issue!
Hi,
I'm using scRepertoire v 2.0.0 and the
combineTCR()
. With this package version there are the gene columns "v_gene", "d_gene" and "j_gene" and the "chain" column. However, I think there should be 2 columns for each of those, right? "v_gene_1", "v_gene_2", etc. . In my specific case, I have sorted gd T cells and after runningcombineTCR()
all "v_gene" columns have "TRDV..." values. For downstream analyses it may also be interesting to have the "TRGV..." information for each cell in a seperate column.