scverse / squidpy

Spatial Single Cell Analysis in Python
https://squidpy.readthedocs.io/en/stable/
BSD 3-Clause "New" or "Revised" License
433 stars 79 forks source link

bug loading vizgen data #673

Open mkunst23 opened 1 year ago

mkunst23 commented 1 year ago

Description

Trouble using vizgen data with sq.read.vizgen(). Function is expected 8 columns in metadata file mine has 9

Minimal reproducible example

adata = sq.read.vizgen(
    path=data_path,
    counts_file=os.path.join(data_path,section,file_path,cbg_file),
    meta_file=os.path.join(data_path,section,file_path,meta_file),
    transformation_file=os.path.join(data_path,section,'region_0/images/micron_to_mosaic_pixel_transform.csv'),
)

Traceback

---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
Input In [54], in <cell line: 1>()
----> 1 adata = sq.read.vizgen(
      2     path=data_path,
      3     counts_file=os.path.join(data_path,section,file_path,cbg_file),
      4     meta_file=os.path.join(data_path,section,file_path,meta_file),
      5     transformation_file=os.path.join(data_path,section,'region_0/images/micron_to_mosaic_pixel_transform.csv'),
      6 )

File /allen/programs/celltypes/workgroups/rnaseqanalysis/mFISH/michaelkunst/miniconda3/envs/scvi-tool/lib/python3.9/site-packages/squidpy/read/_read.py:146, in vizgen(path, counts_file, meta_file, transformation_file, library_id, **kwargs)
    144 # fmt: off
    145 coords = pd.read_csv(path / meta_file, header=0, index_col=0)
--> 146 coords.columns = ["fov", "volume", "center_x", "center_y", "min_x", "max_x", "min_y", "max_y"]
    147 # fmt: on
    149 adata.obs = pd.merge(adata.obs, coords, how="left", left_index=True, right_index=True)

File /allen/programs/celltypes/workgroups/rnaseqanalysis/mFISH/michaelkunst/miniconda3/envs/scvi-tool/lib/python3.9/site-packages/pandas/core/generic.py:5588, in NDFrame.__setattr__(self, name, value)
   5586 try:
   5587     object.__getattribute__(self, name)
-> 5588     return object.__setattr__(self, name, value)
   5589 except AttributeError:
   5590     pass

File /allen/programs/celltypes/workgroups/rnaseqanalysis/mFISH/michaelkunst/miniconda3/envs/scvi-tool/lib/python3.9/site-packages/pandas/_libs/properties.pyx:70, in pandas._libs.properties.AxisProperty.__set__()

File /allen/programs/celltypes/workgroups/rnaseqanalysis/mFISH/michaelkunst/miniconda3/envs/scvi-tool/lib/python3.9/site-packages/pandas/core/generic.py:769, in NDFrame._set_axis(self, axis, labels)
    767 def _set_axis(self, axis: int, labels: Index) -> None:
    768     labels = ensure_index(labels)
--> 769     self._mgr.set_axis(axis, labels)
    770     self._clear_item_cache()

File /allen/programs/celltypes/workgroups/rnaseqanalysis/mFISH/michaelkunst/miniconda3/envs/scvi-tool/lib/python3.9/site-packages/pandas/core/internals/managers.py:214, in BaseBlockManager.set_axis(self, axis, new_labels)
    212 def set_axis(self, axis: int, new_labels: Index) -> None:
    213     # Caller is responsible for ensuring we have an Index object.
--> 214     self._validate_set_axis(axis, new_labels)
    215     self.axes[axis] = new_labels

File /allen/programs/celltypes/workgroups/rnaseqanalysis/mFISH/michaelkunst/miniconda3/envs/scvi-tool/lib/python3.9/site-packages/pandas/core/internals/base.py:69, in DataManager._validate_set_axis(self, axis, new_labels)
     66     pass
     68 elif new_len != old_len:
---> 69     raise ValueError(
     70         f"Length mismatch: Expected axis has {old_len} elements, new "
     71         f"values have {new_len} elements"
     72     )

ValueError: Length mismatch: Expected axis has 9 elements, new values have 8 elements

Version

'1.2.2'

...

andrewjkwok commented 1 year ago

I'm experiencing something similar except it's 16 elements rather than 9?

ValueError: Length mismatch: Expected axis has 16 elements, new values have 8 elements

michalk8 commented 1 year ago

Hi @mkunst23 and @andrewjkwok , this should've been fixed in #648, installing squidpy from main should fix this.

andrewjkwok commented 1 year ago

@michalk8 Thanks for pointing us in the right direction - it works correctly now!

andrewjkwok commented 1 year ago

Sorry it seems it isn't entirely working yet. When I read the data in, my obs dataframe gets all NaNs, but if I check my cell_metadata.csv file, it's looks populated with the various cell coordinates etc. to me. The result is that I have no spatial coordinates to plot.

A second quick thing is that previous merscope outputs gave the cell coordinates in a set of hdf5 files, but that has since the merscope software update to v232 and onwards becomes a single parquet file instead - does squidpy need this info at all? I can't seem to find anywhere in the squidpy documentation that uses this file and am wondering if that would help the issue of the lack of spatial coordinates.

giovp commented 1 year ago

hey @andrewjkwok ,

thanks for reporting this, I'm afraid it's a bit tricky to help out without the data available. Could you share the data download so we can test it out? thanks!

andrewjkwok commented 1 year ago

@giovp Yes very happy to. Is there an email I could share a google drive link to? Many thanks in advance.

andrewjkwok commented 1 year ago

Sorry just a quick follow up @giovp @michalk8 was wondering if there was somewhere to share my data with your team to take a look?

giovp commented 1 year ago

@andrewjkwok any chance you could point us to some public data? for example, some data shared by vizgen?

andrewjkwok commented 1 year ago

@giovp hmm the problem is that the cell metadata file from my MERSCOPE output (running their latest v232 software) doesn't look the same as the ones that are on vizgen's website.

So if I go to the squidpy website and follow the tutorial (https://squidpy.readthedocs.io/en/stable/external_tutorials/tutorial_vizgen.html) for the data download (https://info.vizgen.com/mouse-brain-map?submissionGuid=a66ccb7f-87cf-4c55-83b9-5a2b6c0c12b9), the cell_metadata.csv file doesn't look the same as the one from my merscope.

I've attached a truncated version of my cell metadata file for reference.

Vizgen website data: datasets_mouse_brain_map_BrainReceptorShowcase_Slice1_Replicate1_cell_metadata_S1R1.csv

Output from my merscope: cell_metadata_truncated.csv

giovp commented 1 year ago

hi @andrewjkwok I am unfortunately unable to look into this in the next two weeks, thanks for sharing the data, I'll get back to you soon

andrewjkwok commented 1 year ago

Hi - just wanted to quickly check whether there was any progress with this?

dfhannum commented 1 year ago

There was an issue with indexing but installing squidpy from main should fix the metadata not populating.

The spatial coordinates are being populated by the center_x and center_y from the metadata. The sq.read.vizgen function doesn't use the cell segmentation output, either the older hdf5 or the newer parquet formats.

giovp commented 3 months ago

hi all, I would suggest to take a look at the https://github.com/scverse/spatialdata-io package to read in spatial omics data. We won't be maintaining the IO reading functions here but update specifications from the commercial platforms only in spatialdata-io