theislab / scib

Benchmarking analysis of data integration tools
MIT License
294 stars 63 forks source link

Question about evaluating graph outputs #286

Closed zhen-he closed 2 years ago

zhen-he commented 2 years ago

Thanks for the great work on scIB!

Currently I'm using scIB to compute metrics for graph output produced by WNN [1] in Seurat v4. I implemented this mainly via the following steps:

  1. Save the similarity matrix using writeMM(seurat_obj$wsnn, "wsnn.mtx") in R;
  2. Load wsnn.mtx in python by using adata_int.obsp["connectivities"] = scipy.io.mmread("wsnn.mtx").tocsr();
  3. Evaluate adata_int by using scIB.

Though results look reasonable, I find the obsp["connectivities"] produced by sc.pp.neighbors() or sc.external.pp.bbknn() is very different from wsnn.mtx. For instance, the diagonal values of obsp["connectivities"] are all 0 if it is generated by sc.pp.neighbors(), but are all 1 when assigned with wsnn.mtx.

I don't know how to correctly use scIB to evaluate methods (like WNN) which only output similarity matrices instead of connectivity matrices. Could you give me some tips? Many thanks!

[1] Integrated analysis of multimodal single-cell data. Cell, 2021

LuckyMD commented 2 years ago

Hi @zhen-he,

We spent quite a lot of time trying to make the graph outputs comparable, which is definitely non-trivial. Do you know what matrix would be used in Seurat to generate a UMAP embedding? This is typically what the connectivities matrix is used for in a scanpy setting. If Seurat uses the similarity matrix you suggest for this directly, then I would interpret both matrices as the same. Maybe get in touch with the Seurat maintainers to get a further opinion on what matrix is regarded as the canonical output.

zhen-he commented 2 years ago

@LuckyMD Thanks for your suggestions!

LuckyMD commented 2 years ago

Just want to rope in @xlancelottx here, as I believe he was about to try the same thing on Seurat v4 WNN integration