Reproducibility of mixscape

jesswhitts commented 1 month ago

Report

Hello,

When running the lda visualisation plot as part of the mixscape workflow, I am not getting consistent results between runs, the plot looks slightly different (although very similar). Is a random seed used at any point, and is there a way to set this so I can get reproducible results?

Code used:

ms = pt.tl.Mixscape() ms.perturbation_signature(mdata["rna"], pert_key="perturbation", control="NT")

adata_pert = mdata["rna"].copy() adata_pert.X = adata_pert.layers["X_pert"]

sc.pp.pca(adata_pert) sc.pp.neighbors(adata_pert, metric="cosine") sc.tl.umap(adata_pert)

ms.mixscape(adata=mdata["rna"], control="NT", labels="guide_allocation", layer="X_pert", logfc_threshold = 0.15, iter_num=100, min_de_genes=5) ms.lda(adata=mdata["rna"], control="NT", labels="guide_allocation", layer="X_pert") ms.plot_lda(adata=mdata["rna"], control="NT")

Version information

adjustText 1.2.0 anndata 0.10.8 matplotlib 3.9.2 mudata 0.3.1 muon 0.1.6 numpy 1.26.4 pandas 2.2.2 pertpy 0.9.4 scanpy 1.10.3 session_info 1.0.0

PIL 10.4.0 absl NA appnope 0.1.4 asttokens NA attr 24.2.0 blitzgsea NA certifi 2024.08.30 charset_normalizer 3.3.2 chex 0.1.86 comm 0.2.2 contextlib2 NA cycler 0.12.1 cython_runtime NA dateutil 2.9.0 debugpy 1.8.5 decorator 5.1.1 decoupler 1.8.0 docrep 0.3.2 equinox 0.11.7 etils 1.9.4 executing 2.1.0 filelock 3.16.1 flax 0.9.0 fsspec 2024.9.0 h5py 3.11.0 idna 3.10 igraph 0.11.6 ipykernel 6.29.5 jax 0.4.33 jaxlib 0.4.33 jaxopt NA jaxtyping 0.2.34 jedi 0.19.1 joblib 1.4.2 kiwisolver 1.4.7 lamin_utils 0.13.4 legacy_api_wrap NA leidenalg 0.10.2 lightning 2.4.0 lightning_fabric 2.4.0 lightning_utilities 0.11.7 lineax 0.0.5 llvmlite 0.43.0 matplotlib_inline 0.1.7 ml_collections NA ml_dtypes 0.5.0 mpl_toolkits NA mpmath 1.3.0 msgpack 1.1.0 multipledispatch 0.6.0 natsort 8.4.0 numba 0.60.0 numpyro 0.15.3 opt_einsum v3.3.0 optax 0.2.3 ott 0.4.8 packaging 24.1 parso 0.8.4 patsy 0.5.6 pickleshare 0.7.5 platformdirs 4.3.6 ply 3.11 prompt_toolkit 3.0.47 psutil 6.0.0 pubchempy 1.0.4 pure_eval 0.2.3 pyarrow 17.0.0 pydev_ipython NA pydevconsole NA pydevd 2.9.5 pydevd_file_utils NA pydevd_plugins NA pydevd_tracing NA pygments 2.18.0 pynndescent 0.5.13 pyomo 6.8.0 pyparsing 3.1.4 pyro 1.9.1 pytorch_lightning 2.4.0 pytz 2024.2 requests 2.32.3 rich NA scipy 1.14.1 scvi 1.1.6.post2 seaborn 0.13.2 six 1.16.0 sklearn 1.5.2 skmisc 0.5.1 sparsecca 0.3.1 stack_data 0.6.2 statsmodels 0.14.3 sympy 1.13.3 texttable 1.7.0 threadpoolctl 3.5.0 toolz 0.12.1 torch 2.4.1 torchgen NA torchmetrics 1.4.2 tornado 6.4.1 tqdm 4.66.5 traitlets 5.14.3 typeguard NA typing_extensions NA umap 0.5.6 urllib3 2.2.3 wcwidth 0.2.13 yaml 6.0.2 zmq 26.2.0

IPython 8.27.0 jupyter_client 8.6.3 jupyter_core 5.7.2

Python 3.12.6 | packaged by conda-forge | (main, Sep 11 2024, 04:55:15) [Clang 17.0.6 ] macOS-13.0.1-arm64-arm-64bit

Zethson commented 1 month ago

Dear @jesswhitts

I suspect it might be the LDA. I'll have a look soon.

jesswhitts commented 3 weeks ago

Hi @Zethson - any update on this?

Zethson commented 3 weeks ago

Sorry, I might have time for this on Friday. If it's urgent for you, you can check all function calls individually whether you always get the same results or not. Then you can tell me where the differences come from and I can try to get rid of them.

I'm very busy at the moment.

jesswhitts commented 1 week ago

Hi @Zethson

I think the issue is with the mixscape function itself, not the LDA. Across 4 runs there are a few cells which are classified differently. Does the random_state parameter need setting in GaussianMixture maybe? I tried setting a global numpy random state, but this didn't work...

Zethson commented 1 week ago

Thanks! @Lilly-May will have a look by the end of the week. Apologies again for taking so long!

jesswhitts commented 5 days ago

Hi @Lilly-May,

Have done some of my own testing, and think that setting the GaussianMixture random_state does indeed fix this issue, I amended the following parts of the mixscape function accordingly:

def mixscape(
    self,
    adata: AnnData,
    labels: str,
    control: str,
    new_class_name: str | None = "mixscape_class",
    min_de_genes: int | None = 5,
    layer: str | None = None,
    logfc_threshold: float | None = 0.25,
    iter_num: int | None = 10,
    random_init: int | None = 1,
    split_by: str | None = None,
    pval_cutoff: float | None = 5e-2,
    perturbation_type: str | None = "KO",
    copy: bool | None = False,
):

mm = GaussianMixture(
                        n_components=2,
                        random_state=random_init,
                        covariance_type="spherical",
                        means_init=means_init,
                        precisions_init=precisions_init,
                    ).fit(np.asarray(pvec).reshape(-1, 1))

I hope these changes are appropriate :)

Best, Jess

scverse / pertpy