scverse / spatialdata-io

BSD 3-Clause "New" or "Revised" License
41 stars 25 forks source link

issue loading MERSCOPE results #89

Closed mkunst23 closed 2 months ago

mkunst23 commented 1 year ago

Hi,

when trying to read in a MERSCOPE results I'm getting the following error

Cell In[10], line 1 ----> 1 sdata = sdio.merscope("/allen/programs/celltypes/production/mfish/mfishmerscopemouseatlas/1288760920/region_0")

File /allen/programs/celltypes/workgroups/rnaseqanalysis/mFISH/michaelkunst/github_projects/spatialdata-io/src/spatialdata_io/readers/merscope.py:192, in merscope(path, vpt_outputs, z_layers, region_name, slide_name, imread_kwargs, image_models_kwargs) 190 geo_df = geo_df.rename_geometry("geometry") 191 geo_df = geo_df[geo_df[MerscopeKeys.Z_INDEX] == 0] # Avoid duplicate boundaries on all z-levels --> 192 geo_df.geometry = geo_df.geometry.map(lambda x: x.geoms[0]) # The MultiPolygons contain only one polygon 193 geo_df.index = geo_df[MerscopeKeys.METADATA_CELL_KEY].astype(str) 195 polygons = ShapesModel.parse(geo_df, transformations={"microns": Identity()})

File /allen/programs/celltypes/workgroups/rnaseqanalysis/mFISH/michaelkunst/miniconda3/envs/SpatialData/lib/python3.10/site-packages/pandas/core/series.py:4540, in Series.map(self, arg, na_action) 4460 def map( 4461 self, 4462 arg: Callable | Mapping | Series, 4463 na_action: Literal["ignore"] | None = None, 4464 ) -> Series: 4465 """ 4466 Map values of Series according to an input mapping or function. 4467 (...) 4538 dtype: object 4539 """ -> 4540 new_values = self._map_values(arg, na_action=na_action) 4541 return self._constructor(new_values, index=self.index, copy=False).finalize( 4542 self, method="map" 4543 )

File /allen/programs/celltypes/workgroups/rnaseqanalysis/mFISH/michaelkunst/miniconda3/envs/SpatialData/lib/python3.10/site-packages/pandas/core/base.py:919, in IndexOpsMixin._map_values(self, mapper, na_action, convert) 916 arr = self._values 918 if isinstance(arr, ExtensionArray): --> 919 return arr.map(mapper, na_action=na_action) 921 return algorithms.map_array(arr, mapper, na_action=na_action, convert=convert)

File /allen/programs/celltypes/workgroups/rnaseqanalysis/mFISH/michaelkunst/miniconda3/envs/SpatialData/lib/python3.10/site-packages/pandas/core/arrays/base.py:2189, in ExtensionArray.map(self, mapper, na_action) 2169 def map(self, mapper, na_action=None): 2170 """ 2171 Map values using an input mapping or function. 2172 (...) 2187 a MultiIndex will be returned. 2188 """ -> 2189 return map_array(self, mapper, na_action=na_action)

File /allen/programs/celltypes/workgroups/rnaseqanalysis/mFISH/michaelkunst/miniconda3/envs/SpatialData/lib/python3.10/site-packages/pandas/core/algorithms.py:1814, in map_array(arr, mapper, na_action, convert) 1812 values = arr.astype(object, copy=False) 1813 if na_action is None: -> 1814 return lib.map_infer(values, mapper, convert=convert) 1815 else: 1816 return lib.map_infer_mask( 1817 values, mapper, mask=isna(values).view(np.uint8), convert=convert 1818 )

File lib.pyx:2917, in pandas._libs.lib.map_infer()

File /allen/programs/celltypes/workgroups/rnaseqanalysis/mFISH/michaelkunst/github_projects/spatialdata-io/src/spatialdata_io/readers/merscope.py:192, in merscope..(x) 190 geo_df = geo_df.rename_geometry("geometry") 191 geo_df = geo_df[geo_df[MerscopeKeys.Z_INDEX] == 0] # Avoid duplicate boundaries on all z-levels --> 192 geo_df.geometry = geo_df.geometry.map(lambda x: x.geoms[0]) # The MultiPolygons contain only one polygon 193 geo_df.index = geo_df[MerscopeKeys.METADATA_CELL_KEY].astype(str) 195 polygons = ShapesModel.parse(geo_df, transformations={"microns": Identity()})

File /allen/programs/celltypes/workgroups/rnaseqanalysis/mFISH/michaelkunst/miniconda3/envs/SpatialData/lib/python3.10/site-packages/shapely/geometry/base.py:997, in GeometrySequence.getitem(self, key) 995 if isinstance(key, (int, np.integer)): 996 if key + m < 0 or key >= m: --> 997 raise IndexError("index out of range") 998 if key < 0: 999 i = m + key

IndexError: index out of range

LucaMarconato commented 1 year ago

Hi @mkunst23, is the data that you are using public? If not, would you be able please to make a reproducible example with mock data or by subsetting and perturbing your original data? Happy to look into this, thank you!

mkunst23 commented 12 months ago

Hi @LucaMarconato, yes the data is available on BIL (https://doi.brainimagelibrary.org/doi/10.35077/g.610).

But in the meantime I did find a fix for that problem by adding a line to make sure the index of the cell by gene table and the metadata match data = pd.read_csv(count_path, index_col=0, dtype={MerscopeKeys.COUNTS_CELL_KEY: str}) obs = pd.read_csv(obs_path, index_col=0, dtype={MerscopeKeys.METADATA_CELL_KEY: str}) obs = obs.reindex(data.index)

I can do a PR later

UPDATE, this is more related to scverse/spatialdata#385 and only shows up when I use an older version of MERSCOPE segmentation

LucaMarconato commented 12 months ago

Hi, thanks for the update. Yes please a PR would be helpful, either in spatialdata or spatialdata-io, whichever you find it most appropriate.

LucaMarconato commented 4 months ago

Thanks for providing a link to the data, I am downloading a portion of it using

wget -r https://download.brainimagelibrary.org/aa/79/aa79b8ba5b3add56/609882/1198980023/merfish_output/

I will try reproducing your bug.

LucaMarconato commented 2 months ago

Hi @mkunst23, back in May I started downloading the data but the download got interrupted a few times (large data) and then I put this on hold while got to work on other issues. I went back to this today, but the data doesn't seem to be available for download anymore.

I kindly ask you if you could:

I will mark the issue as stale until then, thanks for your input and thanks again for sharing a workaround in https://github.com/scverse/spatialdata-io/issues/89#issuecomment-1781296802.

LucaMarconato commented 2 months ago

CC @quentinblampey, sharing this issue in case you or any SOPA user experiences this.

quentinblampey commented 2 months ago

Actually, we don't load the shapes inside Sopa (as we intend to create new shapes) so we don't experience this issue 🙂