Closed tsvvas closed 1 month ago
Reading the issues, seems like I found the setter. However, it also throws an error during checking target region column symmetry, probably because my new element has only a subset of indices from the table:
>>> sdata.set_table_annotates_spatialelement(table_name="table", region="nuclei_subset_shapes")
FIle ...spatialdata/models/models.py:1078, in check_target_region_column_symmetry(table, region_key, target)
...
ValueError: Mismatch(es) found between regions in region column in obs and target element: cell_circles
Hi Vasily,
in this tutorial we show how to manipulate the table, I would suggest to have a look at it in case as it illustrate some functions that you could find useful to address your issue https://spatialdata.scverse.org/en/latest/tutorials/notebooks/notebooks/examples/tables.html.
Also, the function get_element_instances()
(recently added, not in the tutorial yet), could be useful.
Anyway, a quick fix could be to call join_spatialelement_table()
with how='right'
and then use the dataloader on the returned circles object.
If you had a labels object this would not be possible as (as the docs for join_spatialelement_table()
say), the right
join is not available for labels. In such a case you could add rows in the matrix by keeping the same value for the region_key
column (in your case it's called 'region'
) and adding extra rows for the unannotated labels instances using the instance_key
column (in your case it's called 'cell_id'
).
Please let me know if this leads to a solution for your problem.
Hi Luca,
I managed to patch the region related attributes with the following function:
from pandas.api.types import is_numeric_dtype
def patch_table_region_attrs(
sdata: spatialdata.SpatialData,
element: str,
table: str = "table",
region_key: str = "region",
instance_key: str = "cell_id",
):
other = "other"
ids = spatialdata.get_element_instances(sdata[element])
tab = sdata.tables[table]
if not is_numeric_dtype(ids):
ids = tab.obs[instance_key][tab.obs[instance_key].isin(ids)].index
region_values = [other] * tab.shape[0]
region_col = pd.Categorical(
region_values, categories=[element, other], ordered=False
)
region_col[ids] = element
attrs = {
"region": element,
"region_key": region_key,
"instance_key": instance_key,
}
sdata.tables[table].obs[region_key] = region_col
sdata.tables[table].uns["spatialdata_attrs"] = attrs
return sdata
Now I get another error during ImageTilesDataset instantiation for AnnData:
File ...site-packages/spatialdata/dataloader/datasets.py:262), in ImageTilesDataset._preprocess(self, tile_scale, tile_dim_in_units, rasterize, table_name)
261 if table_name is not None:
--> 262 table_subset = filtered_table[filtered_table.obs[region_key] == region_name]
File ...site-packages/anndata/_core/anndata.py:1066), in AnnData._normalize_indices(self, index)
1065 def _normalize_indices(self, index: Index | None) -> tuple[slice, slice]:
--> 1066 return _normalize_indices(index, self.obs_names, self.var_names)
File ...site-packages/anndata/_core/index.py:53), in _normalize_index(indexer, index)
53 if not isinstance(index, pd.RangeIndex):
54 msg = "Don’t call _normalize_index with non-categorical/string names"
55 assert index.dtype != float, msg
56 assert index.dtype != int, msg
AssertionError: Don’t call _normalize_index with non-categorical/string names
Seems like the message is a bit misleading, and the problem is in the way obs_names
are stored in the table:
>>> sdata.tables["table"].obs_names.dtype
dtype('int64')
>>> sdata.tables["table"].obs_names.dtype != int
False
Nice that you managed to fix the problem. I believe that the bug you reported is due to a limitation of anndata
, which doesn't currently allow for having integers as obs (see here: https://github.com/scverse/anndata/issues/777). Converting the obs_names
or obs.index
to strings should fix the issues.
In spatialdata
we don't rely on obs_names
or obs.index
because we want to allow for both integers and strings to be as names for the instances; this is the reason why we introduced the instance_key
column. In other words, the obs_names
could be anything, spatialdata
will not look at them; the link between the elements and the table are exclusively made via the region
, region_key
and instance_key
information.
Yes, changing data type solves the last issue.
sdata.tables["table"].obs_names = sdata.tables["table"].obs_names.map(str)
Thank you!
Hello @LucaMarconato,
I continue with the analysis of xenium + post-xenium IHC. After rasterization I want to visualize the cell nuclei in IHC, and it seems that ImageTilesDataset from the deep learning tutorial is the way to do that.
However, ImageTilesDataset throws a RuntimeError when instantiated, telling me that the provided indices are not annotated by the table. Seems like the problem is in almost unnecessary at the first glance check in the lines 213-215.
In my dataset
get_table_keys
returns keys only for one spatialdata element, which is cell_circles:Should I manually add the annotations for the other elements? I couldn't find a setter for those in documentation.
It also seems that the data structure in unsorted doesn't allow it to have many annotation columns at the same time, as there is only one region key per table.
How can I correctly set the attributes to avoid the RuntimeError?
Many thanks, Vasily