immunogenomics / harmony

Fast, sensitive and accurate integration of single-cell data with Harmony
https://portals.broadinstitute.org/harmony/
Other
513 stars 98 forks source link

Parameters for integrating different species #220

Closed DiracZhu1998 closed 9 months ago

DiracZhu1998 commented 10 months ago

Dear authors,

thank you for bring us such a wonderful toolkit! I am currently trying to integrate multiple evolutionary distant species from human, mouse, and lizard and I will integrate more species. But the integration didn't work that well, please see below attachment. I have tried different features, dimensions, and nclust parameters, I was wondering do you have some suggestions (other parameters, prior labelling, linear/non-linear way ) when we integrate different species tissue (let's say whole brain tissue).

Best wishes, Yuanzhen

Screenshot 2023-11-18 at 11 05 47
DiracZhu1998 commented 10 months ago

the group.by.vars is species

pati-ni commented 9 months ago

Hi @DiracZhu1998

I guess choosing your common gene ortholog features will be the most integral step. Everything else occurs in the latent space so it should be not different than any other analysis. To enforce integration between datasets you could experiment with higher values of the theta parameter. Also, if you have individual batches for each species I would include these also (maybe with a different theta).

If you have not tried it yet, I would recommend using the latest version of harmony because it is more robust and faster. It is available on master branch of github.

DiracZhu1998 commented 9 months ago

Dear @pati-ni ,

Thank you for your reply! I have tried tuning theta as 1 or 1.5 as below and nothing else changed compared to previous version. But there are some errors occurred.

theta.usage I set as 1 or 1.5 Combined <- Combined %>% RunHarmony( group.by.vars = var.usage, assay.use = n.usage, reduction.use = "pca", dims.use = 1:dim.usage, nclust = nclust.usage, plot_convergence = TRUE, max_iter = 10, theta = theta.usage, lamdba = lambda.usage, .options = harmony_options(max.iter.cluster = 30, epsilon.harmony = -Inf, epsilon.cluster = -Inf) )

Error: 17:03:13 UMAP embedding parameters a = 0.9922 b = 1.112 Error in checkna(X) : Missing values found in 'X' Calls: RunUMAP ... RunUMAP -> RunUMAP.default -> umap -> uwot -> checkna. I was wondering have you meet this situation before? and I would definitely try your other suggestions!

Best wishes, Yuanzhen

pati-ni commented 9 months ago

What is the UMAP command you are using? Can you make sure that harmony returns values without NAs?

What assay are you using?

DiracZhu1998 commented 9 months ago

Thank you for your suggestions! I have already solved this problem. It turns out mistakenly copy of args$theta as args$var, so after correct this, no errors. var.usage <- as.character(if(!is.null(args$var)) args$var else "orig.ident") theta.usage <- as.double(if(!is.null(args$theta)) args$theta else 2)

Thank you for your help and suggestions!