scverse / squidpy

Spatial Single Cell Analysis in Python
https://squidpy.readthedocs.io/en/stable/
BSD 3-Clause "New" or "Revised" License
440 stars 82 forks source link

KeyError: '72' when reading in Nanostring CosMx data #815

Closed josiejenyne closed 5 months ago

josiejenyne commented 7 months ago

I am receiving the error below when I am loading in Nanostring data. I am following the same format I had used for the Nanostring FFPE Lung dataset used in the tutorial. I have also restructured the path to have the same exact folders and files as the Lung data. I am not sure what is causing the issue. I am using sq.read.nanostring() to read in the data. What is causing this issue? When I read in data from slide 2, I am given KeyError: '110'

KeyError                                  Traceback (most recent call last)
<ipython-input-14-ea9a562318f3> in <module>
      5     counts_file="FF_exprMat_file.csv",
      6     meta_file="FF_metadata_file.csv",
----> 7     fov_file='FF_fov_positions_file.csv'
      8 )

~/anaconda3/envs/py3_7/lib/python3.7/site-packages/squidpy/read/_read.py in nanostring(path, counts_file, meta_file, fov_file)
    238             if fname.endswith(file_extensions):
    239                 fov = str(int(pat.findall(fname)[0]))
--> 240                 adata.uns[Key.uns.spatial][fov]["images"][kind] = _load_image(path / subdir / fname)
    241 
    242     if fov_file is not None:

KeyError: '72'
acjordan333 commented 7 months ago

Have you examined the structure of your fov positions file and the file from the FFPE lung dataset? If you compare them you might see that there is a difference between your file and the example file.

The lab I am in has been generating CosMx data from our own machine and we have had to alter the structure of our fov position file in order to get it to match the structure required for the sq.read.nanostring() function. Specifically, we had to take the 'FOV' column in the file, duplicate it to create a column named 'fov', and make that column the index column. It appears that Nanostring has been changing the structure of the flat files as they have been updating their software. We did not have this problem when we analyzed pilot data that was generated by Nanostring in late 2023.

We did not have the same error you are describing but it would not have been possible for us to upload our data without altering the fov file. I believe the scverse team will have to update the LoadNanostring function soon as more changes are coming to the structure of the files as Nanostring continues to make their updates. Hopefully this helps in some way.

giovp commented 6 months ago

hi both, thank you for raising this, indeed it's quite hard to keep track of all the changes that various companies implement on their pipeline's output format. The most up to date readers for technologies can be found in https://spatialdata.scverse.org/projects/io/en/latest/ , could you check if you can read the format with those, and if so it would be possibly easier to then use the spatialdata format in squipdy.

josiejenyne commented 6 months ago

Hi all, I have used a more updated version of Python (3.11, previously was 3.7). I got a similar error again. I do have FOV 72 in both folders. I am not sure why, it does a similar thing with the data from the other slides but with multiple FOVs.

Here is the version of the packages: scanpy==1.9.5 anndata==0.10.2 umap==0.5.4 numpy==1.25.2 scipy==1.11.3 pandas==2.1.1 scikit-learn==1.3.1 statsmodels==0.14.0 igraph==0.10.8 pynndescent==0.5.10 squidpy==1.4.1

WARNING: FOV `72` does not exist in CellComposite folder, skipping it.
WARNING: FOV `72` does not exist in CellLabels folder, skipping it.
---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
Cell In[7], line 2
      1 #loading in FF
----> 2 adata = sq.read.nanostring(
      3     path = '/mnt/hpc/data/Internal_Tests/240210_CosMx_CoreTrainingData/CoreTrainingData/FF/20240127_011618_S2/CellStatsDir/test',
      4    # path="/home/genomics/genomics/data/Internal_Tests/240210_CosMx_CoreTrainingData/CoreTrainingData/FF/20240127_011618_S2/CellStatsDir/test",
      5     counts_file="FF_exprMat_new_file.csv",
      6     meta_file="FF_metadata_short_file.csv",
      7     fov_file='FF_fov_positions_file_alt.csv'
      8 )

File ~/anaconda3/envs/singlecell/lib/python3.11/site-packages/squidpy/read/_read.py:267, in nanostring(path, counts_file, meta_file, fov_file)
    264                     continue
    266 if fov_file is not None:
--> 267     fov_positions = pd.read_csv(path / fov_file, header=0, index_col=fov_key)
    268     for fov, row in fov_positions.iterrows():
    269         try:

File ~/anaconda3/envs/singlecell/lib/python3.11/site-packages/pandas/io/parsers/readers.py:948, in read_csv(filepath_or_buffer, sep, delimiter, header, names, index_col, usecols, dtype, engine, converters, true_values, false_values, skipinitialspace, skiprows, skipfooter, nrows, na_values, keep_default_na, na_filter, verbose, skip_blank_lines, parse_dates, infer_datetime_format, keep_date_col, date_parser, date_format, dayfirst, cache_dates, iterator, chunksize, compression, thousands, decimal, lineterminator, quotechar, quoting, doublequote, escapechar, comment, encoding, encoding_errors, dialect, on_bad_lines, delim_whitespace, low_memory, memory_map, float_precision, storage_options, dtype_backend)
    935 kwds_defaults = _refine_defaults_read(
    936     dialect,
    937     delimiter,
   (...)
    944     dtype_backend=dtype_backend,
    945 )
    946 kwds.update(kwds_defaults)
--> 948 return _read(filepath_or_buffer, kwds)

File ~/anaconda3/envs/singlecell/lib/python3.11/site-packages/pandas/io/parsers/readers.py:617, in _read(filepath_or_buffer, kwds)
    614     return parser
    616 with parser:
--> 617     return parser.read(nrows)

File ~/anaconda3/envs/singlecell/lib/python3.11/site-packages/pandas/io/parsers/readers.py:1748, in TextFileReader.read(self, nrows)
   1741 nrows = validate_integer("nrows", nrows)
   1742 try:
   1743     # error: "ParserBase" has no attribute "read"
   1744     (
   1745         index,
   1746         columns,
   1747         col_dict,
-> 1748     ) = self._engine.read(  # type: ignore[attr-defined]
   1749         nrows
   1750     )
   1751 except Exception:
   1752     self.close()

File ~/anaconda3/envs/singlecell/lib/python3.11/site-packages/pandas/io/parsers/c_parser_wrapper.py:333, in CParserWrapper.read(self, nrows)
    330     data = {k: v for k, (i, v) in zip(names, data_tups)}
    332     names, date_data = self._do_date_conversions(names, data)
--> 333     index, column_names = self._make_index(date_data, alldata, names)
    335 return index, column_names, date_data

File ~/anaconda3/envs/singlecell/lib/python3.11/site-packages/pandas/io/parsers/base_parser.py:370, in ParserBase._make_index(self, data, alldata, columns, indexnamerow)
    367     index = None
    369 elif not self._has_complex_date_col:
--> 370     simple_index = self._get_simple_index(alldata, columns)
    371     index = self._agg_index(simple_index)
    372 elif self._has_complex_date_col:

File ~/anaconda3/envs/singlecell/lib/python3.11/site-packages/pandas/io/parsers/base_parser.py:402, in ParserBase._get_simple_index(self, data, columns)
    400 index = []
    401 for idx in self.index_col:
--> 402     i = ix(idx)
    403     to_remove.append(i)
    404     index.append(data[i])

File ~/anaconda3/envs/singlecell/lib/python3.11/site-packages/pandas/io/parsers/base_parser.py:397, in ParserBase._get_simple_index.<locals>.ix(col)
    395 if not isinstance(col, str):
    396     return col
--> 397 raise ValueError(f"Index {col} invalid")

ValueError: Index fov invalid
giovp commented 6 months ago

hi @josiejenyne it looks like the fov_key which is hardcoded as "fov" is not correct for your file. Again this is possibly because the company changed the spec or because your file has been modified. Either way, I would suggest to submit this issue to spatialdata-io or otherwise open a PR with a possible fix here in squidpy. For the PR, one option would be to pass the fov_key in the argument, alternatively modify the file to have the index of the fov id as fov

LLehner commented 5 months ago

@josiejenyne, @acjordan333 As @giovp mentioned before, for loading CosMx datasets efficiently I would highly recommend to use spatialdata-io, which has a reader for CosMx called cosmx().