scCODA covariate has non-unique values

schroeme commented 5 months ago

Report

Good morning,

I'm running the pertpy implementation of scCODA. I'm getting the following error

Covariate sex has non-unique values! Skipping...

when I run

sccoda_model = pt.tl.Sccoda() sccoda_data = sccoda_model.load( adata_small, type="cell_level", generate_sample_level=True, cell_type_identifier="cell_type", sample_identifier="10x_batch", covariate_obs=["age","region","sex","replicate"], )

where sex has two values, male and female. I did not get this error when sex had 3 values, male, female, and mixed. Observation names are unique and the data type for sex is categorical. What might be causing this issue? Can a covariate not have only two possible values? Please let me know if you need more information.

Version information

altair 4.2.2 anndata 0.8.0 matplotlib 3.7.1 mudata 0.2.1 numpy 1.23.5 pandas 1.5.3 pertpy 0.6.0 scanpy 1.9.3 seaborn 0.11.2 session_info 1.0.0 tensorflow 2.12.0

PIL 9.4.0 PyQt5 NA aa8f2297d25b4dc6fd3d98411eb3ba53823c4f42 NA absl NA adjustText NA aiohttp 3.9.3 aiosignal 1.3.1 annotated_types 0.6.0 anyio NA arrow 1.2.3 arviz 0.14.0 asttokens NA astunparse 1.6.3 async_timeout 4.0.3 attr 22.2.0 backcall 0.2.0 backoff 2.2.1 boto3 1.34.31 botocore 1.34.31 bs4 4.12.0 certifi 2022.12.07 cffi 1.15.1 cftime 1.6.2 charset_normalizer 3.1.0 chex 0.1.7 click 8.1.3 comm 0.1.3 contextlib2 NA croniter NA custom_inherit 2.4.1 cycler 0.10.0 cython_runtime NA dateutil 2.8.2 debugpy 1.6.6 decorator 5.1.1 decoupler 1.5.0 deepdiff 6.7.1 defusedxml 0.7.1 docrep 0.3.2 entrypoints 0.4 ete3 3.1.2 etils 1.5.2 executing 1.2.0 fastapi 0.109.0 flatbuffers 23.3.3 flax 0.8.0 fontTools 4.39.2 fqdn NA frozenlist 1.4.1 fsspec 2023.12.2 gast NA google NA h5py 3.8.0 idna 3.4 igraph 0.10.8 importlib_metadata NA importlib_resources NA ipykernel 6.22.0 ipython_genutils 0.2.0 ipywidgets 8.0.5 isoduration NA jax 0.4.23 jaxlib 0.4.23 jaxopt NA jedi 0.18.2 jinja2 3.1.2 jmespath 1.0.1 joblib 1.2.0 jsonpointer 2.3 jsonschema 4.17.3 keras 2.12.0 kiwisolver 1.4.4 leidenalg 0.9.1 lightning 2.0.9.post0 lightning_cloud 0.5.64 lightning_fabric 2.1.3 lightning_utilities 0.10.1 llvmlite 0.39.1 markupsafe 2.1.2 matplotlib_inline 0.1.6 mizani 0.8.1 ml_collections NA ml_dtypes 0.3.2 mpl_toolkits NA mpmath 1.3.0 msgpack 1.0.7 multidict 6.0.4 multipart 0.0.6 multipledispatch 0.6.0 natsort 8.3.1 netCDF4 1.6.3 numba 0.56.4 numpyro 0.13.2 opt_einsum v3.3.0 optax 0.1.8 ordered_set 4.1.0 ott 0.4.5 packaging 23.0 palettable 3.3.0 parso 0.8.3 patsy 0.5.3 pexpect 4.8.0 pickleshare 0.7.5 pkg_resources NA platformdirs 3.2.0 plotnine 0.10.1 ply 3.11 png 0.20220715.0 prompt_toolkit 3.0.38 psutil 5.9.4 ptyprocess 0.7.0 pure_eval 0.2.2 pycparser 2.21 pydantic 2.1.1 pydantic_core 2.4.0 pydev_ipython NA pydevconsole NA pydevd 2.9.5 pydevd_file_utils NA pydevd_plugins NA pydevd_tracing NA pygments 2.14.0 pynndescent 0.5.8 pyomo 6.7.0 pyparsing 3.0.9 pyro 1.8.6 pyrsistent NA pytorch_lightning 2.1.3 pytz 2023.2 reportlab 3.6.12 requests 2.28.2 rfc3339_validator 0.1.4 rfc3986_validator 0.1.1 rich NA scipy 1.10.1 scvi 1.0.4 setuptools 65.6.3 sip NA six 1.16.0 sklearn 1.2.2 skmisc 0.1.4 sniffio 1.3.0 soupsieve 2.4 sparse 0.15.1 sparsecca 0.3.1 stack_data 0.6.2 starlette 0.35.1 statsmodels 0.13.5 sympy 1.12 tensorboard 2.12.0 termcolor NA texttable 1.6.7 threadpoolctl 3.1.0 toolz 0.12.0 torch 2.2.0+cu121 torchgen NA torchmetrics 1.3.0.post0 tornado 6.2 toyplot 1.0.3 toytree 2.0.1 tqdm 4.65.0 traitlets 5.9.0 tree 0.1.8 typing_extensions NA umap 0.5.3 uri_template NA urllib3 1.26.15 uvicorn 0.27.0.post1 wcwidth 0.2.6 webcolors 1.13 websocket 1.5.1 websockets 12.0 wrapt 1.14.1 xarray 2023.3.0 xarray_einstats 0.5.1 yaml 6.0 yarl 1.9.4 zipp NA zmq 25.0.2 zoneinfo NA

IPython 8.11.0 jupyter_client 8.1.0 jupyter_core 5.3.0 notebook 6.5.3

Python 3.9.16 (main, Mar 8 2023, 14:00:05) [GCC 11.2.0] Linux-5.11.0-25-generic-x86_64-with-glibc2.31

Session information updated at 2024-02-03 06:46

johannesostner commented 5 months ago

Hi @schroeme!

This looks similar to #224. Rephrasing the discussion from there - the sample_identifier column(s) define a partition of the cells into (statistical) samples - in each sample, all cells will be aggregated by their cell type. Like in a regression setting, each covariate defined in covariate_obs can only have one value. If there are cells with different values in the same sample, this covariate will be skipped and the warning above is displayed. My guess would be that one of your 10x batches has some cells labeled with male and some labeled with female in it. If you want to split those cells into different samples, you can move sex to the sample_identifiers.

schroeme commented 5 months ago

Thank you! That makes sense. Yes, re-reading that thread now, I see it's basically the same question - I seem to have forgotten! Yes, a few of our 10x batches have cells from both sexes. Assuming I can have more than one sample_identifier, If I move sex into sample_identifiers, will that change anything for the 10x reactions that only have value for sex?

johannesostner commented 4 months ago

If there's only one value for sex for a 10x reaction, then all cells from that reaction will be aggregated together, so you should be fine. You can always check your data["coda"].obs dataframe after running the load function and see if the parameter combinations correspond to your experimental setup

schroeme commented 4 months ago

Thanks, that worked!

Best, Margaret

scverse / pertpy