Open stemangiola opened 2 years ago
Hi @stemangiola, With pip updates since we ran the benchmark, it's now no longer straightforward to install DESC and other tools in the same environment (the same is true for scGen in the version we ran it), so you would now need to install these tools separately and run them in a separate environment. You can still use the pipeline for all other tools.
For the use case example, maybe @mumichae can help you out with that.
Hi @stemangiola,
Thanks for the suggestions, we are currently working on documentation and have notebook tutorials planned. I'll get back to you once this is in development.
It seems that adata_int is the adata after integration. Is adata_int. X the embedding after processing by the batch correlation methods such as trVAE?
Hi @CocoGzh,
No, the embedding should be stored in adata.obsm['X_emb']
for embedding methods after integration like trVAE.
@
Hi @CocoGzh,
No, the embedding should be stored in
adata.obsm['X_emb']
for embedding methods after integration like trVAE.
Thx! @LuckyMD With harmony as an example. According to what you said, I put embedding into adata.obsm['X_pca_harmony'], and set the 'embed' parameter: scib.metrics.metrics(adata_a, adata_int, embed="X_pca_harmony").
However, there are some problems here. During the calculation, some metrics will be calculated incorrectly. For example, when calculating NMI, cluster labels will be obtained through Louvain firstly. While sc.tl.louvain algorithm will not be based on X pca Harmony. It calculates neighbors, distances or connectivity based on original X. The NMI result obtained in this way is the data before batch correlation.
What I'm doing now is to instantiate a new object, adata_int = sc.AnnData(X=adata.obsm['X_pca_harmony'], obs=adata.obs). and sc.pp.pca (adata_int). Then do scib.metrics.metrics(adata_a, adata_int, batch_key='', label_key='', embed='').
Do you have any suggestions?
Hi @CocoGzh,
We typically use the metrics()
function in the context of our metrics.py
script from the scib-pipeline repo
Here, we precompute the knn graph before running the metrics function as shown here: https://github.com/theislab/scib-pipeline/blob/0c7be53b1000864fcd31a7b7594f9a5071204233/scripts/metrics/metrics.py#L161-L168
You could just use this reduceData()
call to do the same for your data, and then use the metrics. I would also recommend to store your adata.obsm['X_pca_harmony'] as
adata.obsm['X_emb']` just in case there is any hard coding of this that we might have missed (I don't think there is, but best to be safe).
@mumichae do we have an instructure for what is expected as input to run the metrics()
functions?
Hi @CocoGzh,
We typically use the
metrics()
function in the context of ourmetrics.py
script from the scib-pipeline repoHere, we precompute the knn graph before running the metrics function as shown here: https://github.com/theislab/scib-pipeline/blob/0c7be53b1000864fcd31a7b7594f9a5071204233/scripts/metrics/metrics.py#L161-L168
You could just use this
reduceData()
call to do the same for your data, and then use the metrics. I would also recommend to store youradata.obsm['X_pca_harmony'] as
adata.obsm['X_emb']` just in case there is any hard coding of this that we might have missed (I don't think there is, but best to be safe).@mumichae do we have an instructure for what is expected as input to run the
metrics()
functions?
Hi, @LuckyMD @mumichae Thx! I found the scripts provided by you that meet my needs in https://github.com/theislab/scib-pipeline/blob/0c7be53b1000864fcd31a7b7594f9a5071204233/scripts/integration/runIntegration.py and https://github.com/theislab/scib-pipeline/blob/0c7be53b1000864fcd31a7b7594f9a5071204233/scripts/metrics/metrics.py.
I have two questions. For the first script, Is "--input_file" an adata object that preprocessed by scanpy after standard pipeline, such as filter cells and genes, select HVG and etc.? I don't find any such operations in it. For the second script, is "--uncorrected" and "--integrated" correspond to the input and output files of the first script respectively?
Btw, one more question. In the second script i.e. the evaluation script, the hyparameter '--type' can choose 'knn' and 'embed'. The algorithms of the MNN and BBKNN that modified adata.uns['neighbors']['distances'] and adata.uns['neighbors']['connectivities'], and we should choose 'knn'. Harmony or other deep learning algorithms modify adata.obsm['X_emb'], we should choose 'embed'. Is it right?
@LuckyMD Emmm, another question. There may be some problems with SCIB for SAUCE and desc 2.03. I obtained SAUCE and SAUCE package from the github URL and can run it through my own script, but I can't correctly use SCIB through "pip install scib[sauce]/[desc]".
Hello, it will be really great if you at least highlight the input needed for the routine metrics ( scib.metrics.metrics() ), i.e. you need two anndata objects, before and after integration. What slots are needed in each data object?
I just get Nans when I try to use the function :(
Thanks
Thanks for the package, (I am an R user with little python recent experience)
I read
https://github.com/theislab/scib
as well as
https://github.com/theislab/scib-pipeline
I have installed the scib repository, however I could not find any completed tutorial (with test file) on how to go from files to benchmark results/table, let's say of just two methods (including Seurat).
For example in the README, there is no info on how to create the variable adata_int
scib.metrics.metrics(adata, adata_int, ari=True, nmi=True)
Also a complete line-by-line installation of all methods (e.g. runDESC not installed by default) would be great.
Thanks a lot!