Closed lazappi closed 4 years ago
Maybe it has to do with scipy versions then?
and does your adata.obsp['connectivities'].data[:10]
look like the faulty .mtx
file or the a correct version of it?
Not scanpy. lisi_graph_py works, but lisi_graph not. Will post notebook shortly.
Example notebook below (just realised it does not display properly, will send you another format on chat). I do not understand why lisi_graph fails, but lisi_graph_py does not. To best of my knowledge they should be the same. My lisi_graph code (same as scIB, but some additional debug args): https://github.com/Hrovatin/scib/blob/6bf4498b336a7547b04333d9dd6e694eef159d40/scIB/metrics.py#L1481 troubleshoot_scib_lisi.pdf
Last three generated lisi directories (from the troubleshooting script) and input.mtx files. There seem to be an error in the file generated by lisi_graph as it has less entries and malformed index.
drwxr-xr-x. 2 karin.hrovatin OG-ICB-User 60 Oct 30 13:59 lisi_tmp1604062755
drwxr-xr-x. 2 karin.hrovatin OG-ICB-User 220 Oct 30 13:59 lisi_tmp1604062778
drwxr-xr-x. 2 karin.hrovatin OG-ICB-User 220 Oct 30 14:00 lisi_tmp1604062803
(rpy2_3) [karin.hrovatin@icb-mona scib]$ ls /tmp/lisi_tmp1604062755
input.mtx
(rpy2_3) [karin.hrovatin@icb-mona scib]$ ls /tmp/lisi_tmp1604062778/
_distances_0.txt _distances_1.txt _distances_2.txt _distances_3.txt _indices_0.txt _indices_1.txt _indices_2.txt _indices_3.txt input.mtx
(rpy2_3) [karin.hrovatin@icb-mona scib]$ ls /tmp/lisi_tmp1604062803
_distances_0.txt _distances_1.txt _distances_2.txt _distances_3.txt _indices_0.txt _indices_1.txt _indices_2.txt _indices_3.txt input.mtx
(rpy2_3) [karin.hrovatin@icb-mona scib]$ head /tmp/lisi_tmp1604062755/input.mtx
%%MatrixMarket matrix coordinate real general
%
43356 43356 835990
1 11440 2.4849173e-01
1 15390 2.1628287e-01
1 16562 2.2891411e-01
1 17211 1.0053829e-01
1 17863 4.0131930e-01
1 90 2.7065918e-01
1 217 6.6888779e-01
(rpy2_3) [karin.hrovatin@icb-mona scib]$ head /tmp/lisi_tmp1604062778/input.mtx
%%MatrixMarket matrix coordinate real general
%
43356 43356 908334
1 90 2.7065906e-01
1 217 6.6888750e-01
1 511 3.3287230e-01
1 928 8.5889757e-02
1 1084 1.5068726e-01
1 1416 1.0888591e-01
1 1833 1.8670695e-01
(rpy2_3) [karin.hrovatin@icb-mona scib]$ head /tmp/lisi_tmp1604062803/input.mtx
%%MatrixMarket matrix coordinate real general
%
43356 43356 908334
1 90 2.7065906e-01
1 217 6.6888750e-01
1 511 3.3287230e-01
1 928 8.5889757e-02
1 1084 1.5068726e-01
1 1416 1.0888591e-01
1 1833 1.8670695e-01
(rpy2_3) [karin.hrovatin@icb-mona scib]$
Found it I think! (see adata and adata_tmp in the below code) https://github.com/theislab/scib/blob/20b18f1f6627f16a72d25e1f08c092d901af1ccf/scIB/metrics.py#L1510
if (type_ == 'embed'):
adata_tmp = sc.pp.neighbors(adata,n_neighbors=15, use_rep = 'X_emb', copy=True)
if (type_ == 'full'):
if 'X_pca' not in adata.obsm.keys():
sc.pp.pca(adata, svd_solver = 'arpack')
adata_tmp = sc.pp.neighbors(adata, n_neighbors=15, copy=True)
else:
adata_tmp = adata.copy()
#if knn - do not compute a new neighbourhood graph (it exists already)
#compute LISI score
ilisi_score = lisi_graph_py(adata = adata, batch_key = batch_key,
n_neighbors = k0, perplexity=None, subsample = subsample,
multiprocessing = multiprocessing, nodes = nodes, verbose=verbose)
clisi_score = lisi_graph_py(adata = adata, batch_key = label_key,
n_neighbors = k0, perplexity=None, subsample = subsample,
multiprocessing = multiprocessing, nodes = nodes, verbose=verbose)
Thanks! Good you found it. I'll provide a fix.
lisi_graph_py
is the python version of lisi, right @mbuttner? We're not running that in our pipeline atm.
What is the issue here exactly? All I see is a comma missing in the ilisi_score = lisi_graph_py(adata = adata
line.
We create a adata_tmp
object, where we recompute neighbors, but we use the adata
in the lisi_graph_py
call instead.
If you use type_ == 'embed' it should generate new adata_tmp with neighbours recomputed. But lisi_graph_py is then computed on input adata
(The missing comma is because I tried to put adata in bold in markdown, but does not work with code, will correct it)
This is now fixed in #201. The fix is only important if you add subsetting in the metrics script, as otherwise the neighborhood graph is already computed in that script and so technically doesn't require recomputation. Thus, we don't have to rerun anything.
I had the following error occur when calculating the LISI metric:
It only happens for one method (scanorama full) so it's not a general problem but I'm guessing something to do with the integration output.