notebook with working tutorial and files

stemangiola commented 2 years ago

Thanks for the package, (I am an R user with little python recent experience)

I read

as well as

https://github.com/theislab/scib-pipeline

I have installed the scib repository, however I could not find any completed tutorial (with test file) on how to go from files to benchmark results/table, let's say of just two methods (including Seurat).

For example in the README, there is no info on how to create the variable adata_int

scib.metrics.metrics(adata, adata_int, ari=True, nmi=True)

Also a complete line-by-line installation of all methods (e.g. runDESC not installed by default) would be great.

Thanks a lot!

LuckyMD commented 2 years ago

Hi @stemangiola, With pip updates since we ran the benchmark, it's now no longer straightforward to install DESC and other tools in the same environment (the same is true for scGen in the version we ran it), so you would now need to install these tools separately and run them in a separate environment. You can still use the pipeline for all other tools.

For the use case example, maybe @mumichae can help you out with that.

mumichae commented 2 years ago

Hi @stemangiola,

Thanks for the suggestions, we are currently working on documentation and have notebook tutorials planned. I'll get back to you once this is in development.

CocoGzh commented 2 years ago

It seems that adata_int is the adata after integration. Is adata_int. X the embedding after processing by the batch correlation methods such as trVAE?

LuckyMD commented 2 years ago

Hi @CocoGzh,

No, the embedding should be stored in adata.obsm['X_emb'] for embedding methods after integration like trVAE.

CocoGzh commented 2 years ago

@

Hi @CocoGzh,

No, the embedding should be stored in adata.obsm['X_emb'] for embedding methods after integration like trVAE.

Thx! @LuckyMD With harmony as an example. According to what you said, I put embedding into adata.obsm['X_pca_harmony'], and set the 'embed' parameter: scib.metrics.metrics(adata_a, adata_int, embed="X_pca_harmony").

However, there are some problems here. During the calculation, some metrics will be calculated incorrectly. For example, when calculating NMI, cluster labels will be obtained through Louvain firstly. While sc.tl.louvain algorithm will not be based on X pca Harmony. It calculates neighbors, distances or connectivity based on original X. The NMI result obtained in this way is the data before batch correlation.

What I'm doing now is to instantiate a new object, adata_int = sc.AnnData(X=adata.obsm['X_pca_harmony'], obs=adata.obs). and sc.pp.pca (adata_int). Then do scib.metrics.metrics(adata_a, adata_int, batch_key='', label_key='', embed='').

Do you have any suggestions?

LuckyMD commented 2 years ago

Hi @CocoGzh,

We typically use the metrics() function in the context of our metrics.py script from the scib-pipeline repo

Here, we precompute the knn graph before running the metrics function as shown here: https://github.com/theislab/scib-pipeline/blob/0c7be53b1000864fcd31a7b7594f9a5071204233/scripts/metrics/metrics.py#L161-L168

You could just use this reduceData() call to do the same for your data, and then use the metrics. I would also recommend to store your adata.obsm['X_pca_harmony'] asadata.obsm['X_emb']` just in case there is any hard coding of this that we might have missed (I don't think there is, but best to be safe).

@mumichae do we have an instructure for what is expected as input to run the metrics() functions?

CocoGzh commented 2 years ago

Hi @CocoGzh,

We typically use the metrics() function in the context of our metrics.py script from the scib-pipeline repo

Here, we precompute the knn graph before running the metrics function as shown here: https://github.com/theislab/scib-pipeline/blob/0c7be53b1000864fcd31a7b7594f9a5071204233/scripts/metrics/metrics.py#L161-L168

You could just use this reduceData() call to do the same for your data, and then use the metrics. I would also recommend to store your adata.obsm['X_pca_harmony'] asadata.obsm['X_emb']` just in case there is any hard coding of this that we might have missed (I don't think there is, but best to be safe).

@mumichae do we have an instructure for what is expected as input to run the metrics() functions?

Hi, @LuckyMD @mumichae Thx! I found the scripts provided by you that meet my needs in https://github.com/theislab/scib-pipeline/blob/0c7be53b1000864fcd31a7b7594f9a5071204233/scripts/integration/runIntegration.py and https://github.com/theislab/scib-pipeline/blob/0c7be53b1000864fcd31a7b7594f9a5071204233/scripts/metrics/metrics.py.

I have two questions. For the first script, Is "--input_file" an adata object that preprocessed by scanpy after standard pipeline, such as filter cells and genes, select HVG and etc.? I don't find any such operations in it. For the second script, is "--uncorrected" and "--integrated" correspond to the input and output files of the first script respectively?

CocoGzh commented 2 years ago

Btw, one more question. In the second script i.e. the evaluation script, the hyparameter '--type' can choose 'knn' and 'embed'. The algorithms of the MNN and BBKNN that modified adata.uns['neighbors']['distances'] and adata.uns['neighbors']['connectivities'], and we should choose 'knn'. Harmony or other deep learning algorithms modify adata.obsm['X_emb'], we should choose 'embed'. Is it right?

CocoGzh commented 2 years ago

@LuckyMD Emmm, another question. There may be some problems with SCIB for SAUCE and desc 2.03. I obtained SAUCE and SAUCE package from the github URL and can run it through my own script, but I can't correctly use SCIB through "pip install scib[sauce]/[desc]".

marianogabitto commented 2 years ago

Hello, it will be really great if you at least highlight the input needed for the routine metrics ( scib.metrics.metrics() ), i.e. you need two anndata objects, before and after integration. What slots are needed in each data object?

 I just get Nans when I try to use the function :(

Thanks

theislab / scib

notebook with working tutorial and files #299