scverse / pertpy

Perturbation Analysis in the scverse ecosystem.
https://pertpy.readthedocs.io/en/latest/
MIT License
141 stars 21 forks source link

Different results Mixscape from PertPy vs Seurat #679

Open albacorp opened 1 week ago

albacorp commented 1 week ago

Report

Hello, We (@jameshyojaelee) are running PertPy Mixscape vs Mixscape with the same parameters and we are getting very different results -- PertPy is calling more perturbed cells than Mixscape -- which is ideal, but we want to be sure if this is true. Could you help us out, please?

Or at least to understand which are the differences between the Mixscape code vs PertPy Mixscape if there are any.

Please find the code attached and examples of the different guide barplots that we get running PertPy vs Seurat

241112_Scanpy_vs._Seurat.txt guide_barplot_PertPy.pdf guide_barplot_Seurat.pdf

Version information

No response

Zethson commented 1 week ago

Thanks for reporting! We will have a look.

Lilly-May commented 1 week ago

Hi @albacorp! Could you provide more information on the data you're using, please? Are you loading it via the pertpy dataloader? If not, it would be great if you could share the data with me. If sharing it publicly isn't possible, feel free to reach out to me via lilly.may@tum.de

jameshyojaelee commented 1 week ago

Hello @Zethson and @Lilly-May, thank you so much for such a quick response. We've been enjoying your PertPy package a lot! @albacorp has generated the dataset from her experiment on the 10x platform. We used Cellranger multi to generate the h5 matrix which contains both the RNA and CRISPR guide data. Below is the code I wrote to import the data.

If this is not enough info, I can also subsample one of our datasets and email it to you. Thank you!

`samples = { "Sample1": "/count/filtered_feature_bc_matrix.h5", "Sample2": "/count/filtered_feature_bc_matrix.h5" } rna_dict = {} crispr_dict = {}

for sample_id, filepath in samples.items(): sample_data = mu.read_10x_h5(filepath) rna_data = sample_data.mod['rna'] crispr_data = sample_data.mod['CRISPR Guide Capture'] rna_data.obsnames = [f"{barcode}{sample_id}" for barcode in rna_data.obs_names] crispr_data.obsnames = [f"{barcode}{sample_id}" for barcode in crispr_data.obs_names] rna_data.var_names_make_unique() crispr_data.var_names_make_unique() rna_data.obs['sample'] = sample_id crispr_data.obs['sample'] = sample_id rna_dict[sample_id] = rna_data crispr_dict[sample_id] = crispr_data

rna_combined = ad.concat(rna_dict, label='sample', join='outer') crispr_combined = ad.concat(crispr_dict, label='sample', join='outer') combined_mdata = mu.MuData({"rna": rna_combined, "gdo": crispr_combined}) combined_mdata.write("combined_mdata.h5mu")`

Zethson commented 1 week ago

Thank you!

I do fear that we need a downsampled version of the dataset where you can reproduce the issue that you described above. We have done extensive tests with other datasets where we verified that the R and Python implementations are very similar. Therefore, your dataset is essential to debug this.

jameshyojaelee commented 1 week ago

Got it! We just sent a sample matrix file to lilly.may@tum.de. Thank you!