Closed axelalmet closed 3 years ago
Hi @axelalmet,
I'm glad that the package and paper have been useful to you so far.
If you have any advice on how to resolve this error, I would greatly appreciate it!
The lisi error looks like it has to do with the C++ file we have added to speed up this metric. To make this work, you will need to compile the knn_graph.cpp
file manually by going to the relevant directory (in your case: /usr/local/lib/python3.8/site-packages/scIB/knn_graph/
) and compiling it with the Makefilie that is in the same directory, or just typing in:
g++ -std=c++11 -O3 knn_graph.cpp -o knn_graph.o
This should really be in the installation instructions on the main github README. Is this something you could briefly add @mumichae ?
- I understand NMI compares clustering via the Louvain algorithm. Does it make a difference if I originally clustered my data using the Leiden algorithm?
This shouldn't matter. The idea here is that your annotation is regarded as ground truth. Whatever method you used to cluster is only seen as a proxy for how to get to this ground-truth annotation.
why is n_neighbors=15? Does it affect results if I calculated a kNN graph for my own data using
n_neighbors=30'
? Also, shoulduse_rep
not accept the argumentembed
, as accepted in the metrics` function?
It shouldn't matter much if n_neighbors is 15 or 30. This function preps the knn_graph.o
call from above which always gets 45 (i think) neighbours out via a graph-based shortest paths algorithm. If you seed with 30 that might be better, but 15 will give you similar results. In the end the difference will probably be marginal.
And you're right, the function would be more flexibly integrated if use_rep
take the embed
argument from metrics. That would probably require it to be passed to lisi_graph
and the lisi functions as well... We will take a look at this and address it when we have scope. If you're interested in contributing a PR, please don't hesitate though :).
Hi @LuckyMD,
Thank you so much for your advice, this really made things a lot clearer!
Your suggestion to compile knn_graph.cpp
worked and now the code seems to run. I now run into a different issue, which is that lisi_graph
causes my laptop to crash. I also tried running the code on my work desktop, which is a much more powerful machine, but it also crashed on that. Mind you, the data consist of just over 100K cells, but is there a way to allow lisi_graph
to be run on a local machine?
Thanks again, Axel.
lisi_graph and kBet are by far the two most expensive metrics to run. We ran this on 100k cells on a cluster with 384GB memory. I would maybe just avoid these 3 metrics (cLISI and iLISI). We decided against subsampling in the metric implementation as we didn't know how it would behave...
Given that, I'm pretty glad I managed to get kBET to work. I have actually tried running the code on a lab server, but it remains to be seen how that goes.
Thank you very much for all your help!
Good luck! I will close this for now. If anything comes up, let's discuss on a new thread or just re-open it if it's regarding the lisi_graph installation.
Hello,
Firstly, thank you for creating this package and the accompanying benchmarking paper. The paper has been an invaluable resource for myself and colleagues over the past year!
I have been trying to make use of
scIB.me.metrics
for my own work, which involves multiple RNA-seq datasets of murine tissue. The command I have been running is:Everything seems to work fine until the final metric, which invokes
lisi_graph
, at which point I get the following error:If you have any advice on how to resolve this error, I would greatly appreciate it!
In addition, I have some additional questions about other functions called during
scIB.me.metrics
:lisi_graph
, for the following command:why is
n_neighbors=15
? Does it affect results if I calculated a kNN graph for my own data usingn_neighbors=30'? Also, should
use_repnot accept the argument
embed, as accepted in the
metrics` function?Thank you very much for your time and help.
Best wishes, Axel.