gustaveroussy / sopa

Technology-invariant pipeline for spatial omics analysis (Xenium / Visium HD / MERSCOPE / CosMx / PhenoCycler / MACSima / ...) that scales to millions of cells
https://gustaveroussy.github.io/sopa/
BSD 3-Clause "New" or "Revised" License
129 stars 15 forks source link

[Bug] Error when writing out tangram annotation mapping #143

Open wlason opened 1 day ago

wlason commented 1 day ago

Description

The tangram runs successfully with GPU acceleration, but it doesn't write out the sdata due to NA error.

╭───────────────────── Traceback (most recent call last) ──────────────────────╮
│ [path]/envs/sopa/lib/python3.10/site │
│ -packages/sopa/cli/annotate.py:75 in tangram                                 │
│                                                                              │
│   72 │   )                                                                   │
│   73 │   if sdata.is_backed():                                               │
│   74 │   │   sdata.delete_element_from_disk(SopaKeys.TABLE)                  │
│ ❱ 75 │   │   sdata.write_element(SopaKeys.TABLE, overwrite=True)             │
│   76                                                                         │
│                                                                              │

...

│ [path]/envs/sopa/lib/python3.10/site │
│ -packages/anndata/_io/specs/methods.py:684 in <listcomp>                     │
│                                                                              │
│   681 │   │   raise ValueError(                                              │
│   682 │   │   │   f"Found repeated column names: {duplicates}. Column names  │
│   683 │   │   )                                                              │
│ ❱ 684 │   col_names = [check_key(c) for c in df.columns]                     │
│   685 │   group.attrs["column-order"] = col_names                            │
│   686 │                                                                      │
│   687 │   if df.index.name is not None:                                      │
│                                                                              │
│ ╭────────────── locals ───────────────╮                                      │
│ │ .0 = <map object at 0x7fd5e70b41f0> │                                      │
│ │  c = nan                            │                                      │
│ ╰─────────────────────────────────────╯                                      │
│                                                                              │
│ [path]/envs/sopa/lib/python3.10/site │
│ -packages/anndata/_io/utils.py:117 in check_key                              │
│                                                                              │
│   114 │   # elif issubclass(typ, bytes):                                     │
│   115 │   # return key                                                       │
│   116 │   else:                                                              │
│ ❱ 117 │   │   raise TypeError(f"{key} of type {typ} is an invalid key. Shoul │
│   118                                                                        │
│   119                                                                        │
│   120 # -------------------------------------------------------------------- │
│                                                                              │
│ ╭─────── locals ────────╮                                                    │
│ │ key = nan             │                                                    │
│ │ typ = <class 'float'> │                                                    │
│ ╰───────────────────────╯                                                    │
╰──────────────────────────────────────────────────────────────────────────────╯
TypeError: nan of type <class 'float'> is an invalid key. Should be str.

Reproducing the issue

sopa annotate tangram path/to/sample.zarr --sc-reference-path "path/to/reference.h5ad" --cell-type-key "annotationl" --reference-preprocessing "log1p" I am running this as part of Snakemake pipeline.

I believe this is caused by an error here: https://github.com/gustaveroussy/sopa/blob/f1f5a99ee7f5a9489e511241a3a62bb520ec9860/sopa/cli/annotate.py#L73-L75

and I wonder if this may be because I (intentionally) have some NAs in the cell annotation column of my original h5ad? Could this be causing a problem, e.g. if the DataFrame is transposed and my NAs become column names? Perhaps this is wrong, but that was just my first idea!

If you could let me know how to troubleshoot this and how the tangram result is attempting to be saved, that would be super helpful!

wlason commented 19 hours ago

I can confirm this was due to NAs in the cell annotation column. I had to re-run the whole pipeline because the table in sdata .zarr became overwritten with 'annotation1' column and read_zarr_standardized would throw an error. I could not rerun just the previous rule (aggregate) to replace the obs in the .zarr file because the individual Baysor patches get deleted here: https://github.com/gustaveroussy/sopa/blob/f1f5a99ee7f5a9489e511241a3a62bb520ec9860/workflow/Snakefile#L187

TLDR: make sure no NAs in the cell annotation column for TANGRAM.