ncborcherding / Ibex

Using BCR and expression for sequence embedding
https://www.borch.dev/uploads/screpertoire/articles/ibex
MIT License
19 stars 1 forks source link

using Ibex when dataset was not processed with scRepertoire #2

Closed cstrlln closed 1 year ago

cstrlln commented 1 year ago

Hello, this is a great idea, been looking to try something like this and looked like the T cell people were more advanced.

Have a couple of questions:

Finally, a suggestion: Pulling V genes just with grep from GEX data is not ideal as there are a lot of pseudogenes, I would suggest using chromosomal location or the biotype assigned by cellranger, soomething like: ig_list <- c("IG_C_gene", "IG_C_pseudogene", "IG_D_gene", "IG_D_pseudogene", "IG_J_gene", "IG_LV_gene", "IG_pseudogene", "IG_V_gene", "IG_V_pseudogene")

And then can query into biomart for genes that have that biotype that are also in your dataset.

Carlos

ncborcherding commented 1 year ago

Hey Carlos,

Thanks for reaching out - you're right the single-cell RNA/BCR space is a little sparse - but you should also check out Benisse too.

In terms of questions:

  1. Yes you should be able to modify the meta data by matching the names to the getBCR() internal function. Overall the function relies on two columns: 1) CTaa which is the cdr3 amino acid sequence of the BCR, formatted into "Heavy_Light" and 2) *CTgene which is the gene segments used for both chains, formatted like "HV.HD.HJ.HC_LV.LJ.LC", this should be enough to get the pipeline working.
  2. Great questions - this is getting clarified in the resubmission of the paper with a comprehensive list of cohorts. But the models were trained on all public single-cell BCR sequences deposited in the Gene Expression Omnibus that were available before November 2022. I am updating the models with additional sources as well for future versions.

Great suggestion - I will add that to the list of changes to implement for the new release!!!

Hope that answers your questions and please let me know if you have any other questions/suggestions as you start using the package.

Nick

cstrlln commented 1 year ago

Thanks Nick, I'll give it a try getting my data to work with ibex.

Another related question: what is the role of the V genes here, how are they used? Are they just kept for referencing? I gather the calculations are based on the aa properties.

Carlos