Closed cstrlln closed 1 year ago
Hey Carlos,
Thanks for reaching out - you're right the single-cell RNA/BCR space is a little sparse - but you should also check out Benisse too.
In terms of questions:
getBCR()
internal function. Overall the function relies on two columns: 1) CTaa which is the cdr3 amino acid sequence of the BCR, formatted into "Heavy_Light" and 2) *CTgene which is the gene segments used for both chains, formatted like "HV.HD.HJ.HC_LV.LJ.LC", this should be enough to get the pipeline working. Great suggestion - I will add that to the list of changes to implement for the new release!!!
Hope that answers your questions and please let me know if you have any other questions/suggestions as you start using the package.
Nick
Thanks Nick, I'll give it a try getting my data to work with ibex.
Another related question: what is the role of the V genes here, how are they used? Are they just kept for referencing? I gather the calculations are based on the aa properties.
Carlos
Hello, this is a great idea, been looking to try something like this and looked like the T cell people were more advanced.
Have a couple of questions:
I already have an SCE object with integrated vdj data, I have the aa sequences, V gene identity and of course barcode, and I have subsetted so they are all heavy chain only. Is there a way to use this without starting again with scRepertoire?. Looks from getBCR that I could change the name of my colnames in colData for aa, v genes. to match the ones used by scRepertoire.... Would this work? what else would I need to change or make sure to go directly to runIbex with my SCE.
which dataset was used for the training? Sorry I might have missed it and not very familiar yet with machine learning.
Finally, a suggestion: Pulling V genes just with grep from GEX data is not ideal as there are a lot of pseudogenes, I would suggest using chromosomal location or the biotype assigned by cellranger, soomething like: ig_list <- c("IG_C_gene", "IG_C_pseudogene", "IG_D_gene", "IG_D_pseudogene", "IG_J_gene", "IG_LV_gene", "IG_pseudogene", "IG_V_gene", "IG_V_pseudogene")
And then can query into biomart for genes that have that biotype that are also in your dataset.
Carlos