immunogenomics / symphony

Efficient and precise single-cell reference atlas mapping with Symphony
GNU General Public License v3.0
95 stars 22 forks source link

How to build the symphony reference if the reference has no clear batch effect? #9

Closed Echo226 closed 3 years ago

Echo226 commented 3 years ago

Hi Symphony team,

Thanks for developing this great tool. I have a question about the building the Symphony Reference. What if I have a reference dataset but there is no clear batch effect (the cells are not clustered by donor or technology), how can I use the Symphony to build this reference? There is a RunHarmony step before building the Symphony compression, but I am not sure if running harmony on a dataset without batch-effect is a good practice or not?

For example, I have a study containing 3 batches. I have done the unsupervised clustering and used the marker genes to annotate the cells on batch1, then I want to transfer the label from batch1 to the remaining 2 batches. What will be your suggestion to utilize Symphony in this case?

Thanks in advance for your reply. Xinting

joycekang commented 3 years ago

Hi Xinting,

Good question. There are 2 quick fixes:

(1) You can currently run buildReference and set vars=NULL to skip batch correction. What happens in this case is that Symphony will define soft clusters for the reference mixture model using soft k-means (using cosine distance). This option is already implemented.

(2) If you don't want to run buildReference from scratch for some reason (e.g. because you already have the PCA embedding for your batch1, called Z_pca_ref), then you can run the following code to build a Symphony reference piece by piece. As long as you name the various reference components correctly, then Symphony mapping should be able to work.

reference = list(meta_data = metadata_ref) # initialize reference as a list with metadata slot reference$loadings = s$u # add loadings from PCA reference$vargenes = vargenes_means_sds # add variable gene info clust_res <- symphony:::soft_kmeans(Z_pca_ref, K) reference$centroids <- clust_res$Y reference$R <- clust_res$R reference$Z_orig <- Z_pca_ref reference$Z_corr <- Z_pca_ref # no batch correction reference$cache = symphony::compute_ref_cache(res$R, res$Z_corr)

That should be able to get you through until we can implement a better solution.

Echo226 commented 3 years ago

Hi joy,

Thanks for your reply. This solves my question!

Best regards, Xinting