Closed cstrlln closed 4 months ago
Hello @cstrlln,
Did you manually subset these files, or did you export this FOV directly from AtomX? According to the error, it seems that your F099_fov_positions_file.csv.gz
file is missing the column "X_mm"
, even though it indeed seems to contain "Y_mm"
and "Z_mm"
. Can you check if it contains this column?
Actually, you don't need to subset your data. With Sopa you can decide to work on one single FOV you want. With the snakemake pipeline, you need to update the config file as below to read only the FOV 99. So you can keep the full data, and this will only run the pipeline on FOV 99:
read:
technology: cosmx
kwargs:
fov: 99
Thanks for the feedback. I do have still the X_mm column, only did a manual subset based on the FOV. I wonder if the csv writing did some weird thing with that column? Any other way I can check the input, maybe in a stepwise manner?
That's good to know for the fov for snakemake. I do envisage doing manual subset datasets though because it is more portable for sharing with colleagues, as our in house toy dataset. We have several FOVs that will not be usable so not worth to keep moving around that data.
Actually, I think the issue is that there is a missing "FOV" column in the F099_fov_positions_file.csv.gz
file. Normally, a file location should look like the one below, and we use the FOV column as an index.
In your case, since your second column is "X_mm"
, it is used as an index, and therefore it's not inside the columns anymore.
Slide,FOV,X_mm,Y_mm
3,1,0.722208817050781,13.7198104736064
3,2,1.23412451267148,13.7198104736064
3,3,1.74604020829218,13.7198104736064
3,4,2.25795590391288,13.7198104736064
TL;DR: can you make sure your F099_fov_positions_file.csv.gz
file looks like this?
Slide,FOV,X_mm,Y_mm
1,99,2.881258,2.678565
So it has to be in that specific order? the cosmx file has this headings
Because I have subsetted the FOV locations files and is one single line now, would that be an issue?
Indeed, I was assuming that the FOV column was always the second one. I don't understand why the columns are shuffled, I'm surprised that the CosMX exports are so inconsistent. Anyway, I should update the reader to deal with this.
I'll soon make a new release to handle this, I'll let you know!
Yes, again cosmx changing data outputs.
So that I can test this before a new release, could you point me to the expected order of columns for the required files? I don't mind re-organizing for this one.
Carlos
Also, if you let me know what files, I can send you recent versions of the output files to have as reference.
It's only the FOV column, it should be the second column, like this:
Slide,FOV,X_mm,Y_mm
1,99,2.881258,2.678565
The order for the other columns doesn't matter
"Also, if you let me know what files I can send you recent versions of the output files to have as reference."
Sorry, I'm not sure to understand what you want?
Really sorry for all this mess with CosMX data. The reader is still quite new, but I try to make it more stable, and I will try to add details on the docs for using the right export module
No worries, clearly CosMX is really poor at documenting outputs.
What I was proposing is I can send you examples of current versions of CosMX outputs so that you can have as reference.
Ah, yes, it would be great, thanks!
I would need the _fov_positions_file
, and the _tx_file
. Here is my email address in case you need it: quentin.blampey@centralesupelec.fr
This should be fixed in sopa==1.0.12
, can you try?
great! will give it a try in the next couple days.
Thanks, it is working now!
Nice!
Do you mind telling me which export module you used? (description or screenshot) I could add this to the docs to help future users!
Of course. It was not necessarily a module. The way to get the data in flat files is within a study, click on Export and then selecet the "Flat CSV Files" section
f)
Just in case, here are the files as of May 2024
Ah, yes, it would be great, thanks! I would need the
_fov_positions_file
, and the_tx_file
. Here is my email address in case you need it: quentin.blampey@centralesupelec.fr
Thanks for these details, I'll update the docs accordingly!
Hello, I was trying to test sopa with a single FOV from my dataset in order to see everything is working, to do so I created separate file versions of each of the CosMX output files for a single FOV i.e. FOV99 in this case. I'm not sure what the right naming or strategy would be to subset a dataset prior to input in sopa or if there is a way to select which FOV specifically to use with the snakemake pipeline. This is how the beginning of each file looks like:
This is the transcripts file
This is the FOV locations file
When I try to run sopa with snakemake I get the following errors:
(sopa-snake) carlos@dhcp-32-058 sopa % snakemake --config data_path=/Users/carlos/projects/sopa_test_1/data_fov99 --configfile=/Users/carlos/projects/sopa_test_1/config_files/cellpose_baysor_localmodel.yaml --cores 2 --use-conda SpatialData object path set to default: /Users/carlos/projects/sopa_test_1/data_fov99.zarr To change this behavior, provide
--config sdata_path=...
when running the snakemake pipeline Building DAG of jobs... Using shell: /bin/bash Provided cores: 2 Rules claiming more threads will be scaled down. Job stats: job countaggregate 1 all 1 explorer 1 image_write 1 patchify_baysor 1 patchify_cellpose 1 report 1 resolve_baysor 1 resolve_cellpose 1 to_spatialdata 1 total 10
Select jobs to execute...
[Tue May 14 15:34:52 2024] rule to_spatialdata: input: /Users/carlos/projects/sopa_test_1/data_fov99 output: /Users/carlos/projects/sopa_test_1/data_fov99.zarr/.zgroup jobid: 4 reason: Missing output files: /Users/carlos/projects/sopa_test_1/data_fov99.zarr/.zgroup resources: tmpdir=/var/folders/2n/b_m3klds26l3_d3qlqd7qtlm0000gq/T, mem_mb=128000, mem_mib=122071
Activating conda environment: sopa-snake ╭─────────────────────────────── Traceback (most recent call last) ────────────────────────────────╮ │ /Users/carlos/mambaforge/envs/sopa-snake/lib/python3.10/site-packages/pandas/core/indexes/base.p │ │ y:3805 in get_loc │ │ │ │ 3802 │ │ """ │ │ 3803 │ │ casted_key = self._maybe_cast_indexer(key) │ │ 3804 │ │ try: │ │ ❱ 3805 │ │ │ return self._engine.get_loc(casted_key) │ │ 3806 │ │ except KeyError as err: │ │ 3807 │ │ │ if isinstance(casted_key, slice) or ( │ │ 3808 │ │ │ │ isinstance(casted_key, abc.Iterable) │ │ │ │ ╭───────────────────────────────────── locals ──────────────────────────────────────╮ │ │ │ casted_key = 'X_mm' │ │ │ │ key = 'X_mm' │ │ │ │ self = Index(['Slide', 'Y_mm', 'Z_mm', 'ZOffset_mm', 'ROI', 'FOV', 'Order', │ │ │ │ │ 'Run_Tissue_name'], │ │ │ │ │ dtype='object') │ │ │ ╰───────────────────────────────────────────────────────────────────────────────────╯ │ │ │ │ in pandas._libs.index.IndexEngine.get_loc:167 │ │ │ │ in pandas._libs.index.IndexEngine.get_loc:196 │ │ │ │ in pandas._libs.hashtable.PyObjectHashTable.get_item:7081 │ │ │ │ in pandas._libs.hashtable.PyObjectHashTable.get_item:7089 │ ╰──────────────────────────────────────────────────────────────────────────────────────────────────╯ KeyError: 'X_mm'
The above exception was the direct cause of the following exception:
╭─────────────────────────────── Traceback (most recent call last) ────────────────────────────────╮ │ /Users/carlos/projects/sopa/sopa/cli/app.py:98 in read │ │ │ │ 95 │ │ io, technology │ │ 96 │ ), f"Technology {technology} unknown. Currently available: xenium, merscope, cosmx, │ │ 97 │ │ │ ❱ 98 │ sdata = getattr(io, technology)(data_path, *kwargs) │ │ 99 │ io.write_standardized(sdata, sdata_path, delete_table=True) │ │ 100 │ │ 101 │ │ │ │ ╭──────────────────────────────────────── locals ─────────────────────────────────────────╮ │ │ │ config_path = None │ │ │ │ data_path = '/Users/carlos/projects/sopa_test_1/data_fov99' │ │ │ │ io = <module 'sopa.io' from '/Users/carlos/projects/sopa/sopa/io/init.py'> │ │ │ │ kwargs = {} │ │ │ │ Path = <class 'pathlib.Path'> │ │ │ │ sdata_path = PosixPath('/Users/carlos/projects/sopa_test_1/data_fov99.zarr') │ │ │ │ technology = 'cosmx' │ │ │ ╰─────────────────────────────────────────────────────────────────────────────────────────╯ │ │ │ │ /Users/carlos/projects/sopa/sopa/io/reader/cosmx.py:54 in cosmx │ │ │ │ 51 │ image_models_kwargs, imread_kwargs = _default_image_kwargs(image_models_kwargs, imre │ │ 52 │ │ │ 53 │ dataset_id = _infer_dataset_id(path, dataset_id) │ │ ❱ 54 │ fov_locs = _read_cosmx_fov_locs(path, dataset_id) │ │ 55 │ fov_id, fov = _check_fov_id(fov) │ │ 56 │ │ │ 57 │ protein_dir_dict = {} │ │ │ │ ╭───────────────────────────────────── locals ─────────────────────────────────────╮ │ │ │ dataset_id = '/Users/carlos/projects/sopa_test_1/data_fov99/F099' │ │ │ │ fov = None │ │ │ │ image_models_kwargs = {'chunks': (1, 1024, 1024), 'scale_factors': [2, 2, 2, 2]} │ │ │ │ imread_kwargs = {} │ │ │ │ path = PosixPath('/Users/carlos/projects/sopa_test_1/data_fov99') │ │ │ │ read_proteins = False │ │ │ ╰──────────────────────────────────────────────────────────────────────────────────╯ │ │ │ │ /Users/carlos/projects/sopa/sopa/io/reader/cosmx.py:163 in _read_cosmx_fov_locs │ │ │ │ 160 │ │ │ 161 │ pixel_size = 0.120280945 # size of a pixel in microns │ │ 162 │ │ │ ❱ 163 │ fov_locs["xmin"] = fov_locs["X_mm"] 1e3 / pixel_size │ │ 164 │ fov_locs["xmax"] = 0 # will be filled when reading the images │ │ 165 │ │ │ 166 │ fov_locs["ymin"] = 0 # will be filled when reading the images │ │ │ │ ╭─────────────────────────────────────────── locals ───────────────────────────────────────────╮ │ │ │ dataset_id = '/Users/carlos/projects/sopa_test_1/data_fov99/F099' │ │ │ │ fov_file = PosixPath('/Users/carlos/projects/sopa_test_1/data_fov99/F099_fov_positions_fi… │ │ │ │ fov_locs = │ │ Slide Y_mm Z_mm ZOffset_mm ROI FOV Order │ │ │ │ Run_Tissue_name │ │ │ │ X_mm │ │ │ │ 2.881258 1 2.678565 0.021493 -2 0 99 33 │ │ │ │ mw_mus_p1_04 │ │ │ │ path = PosixPath('/Users/carlos/projects/sopa_test_1/data_fov99') │ │ │ │ pixel_size = 0.120280945 │ │ │ ╰──────────────────────────────────────────────────────────────────────────────────────────────╯ │ │ │ │ /Users/carlos/mambaforge/envs/sopa-snake/lib/python3.10/site-packages/pandas/core/frame.py:4102 │ │ in getitem │ │ │ │ 4099 │ │ if is_single_key: │ │ 4100 │ │ │ if self.columns.nlevels > 1: │ │ 4101 │ │ │ │ return self._getitem_multilevel(key) │ │ ❱ 4102 │ │ │ indexer = self.columns.get_loc(key) │ │ 4103 │ │ │ if is_integer(indexer): │ │ 4104 │ │ │ │ indexer = [indexer] │ │ 4105 │ │ else: │ │ │ │ ╭─────────────────────────────────────────── locals ───────────────────────────────────────────╮ │ │ │ is_mi = False │ │ │ │ is_single_key = True │ │ │ │ key = 'X_mm' │ │ │ │ self = │ │ Slide Y_mm Z_mm ZOffset_mm ROI FOV Order │ │ │ │ Run_Tissue_name │ │ │ │ X_mm │ │ │ │ 2.881258 1 2.678565 0.021493 -2 0 99 33 │ │ │ │ mw_mus_p1_04 │ │ │ ╰──────────────────────────────────────────────────────────────────────────────────────────────╯ │ │ │ │ /Users/carlos/mambaforge/envs/sopa-snake/lib/python3.10/site-packages/pandas/core/indexes/base.p │ │ y:3812 in get_loc │ │ │ │ 3809 │ │ │ │ and any(isinstance(x, slice) for x in casted_key) │ │ 3810 │ │ │ ): │ │ 3811 │ │ │ │ raise InvalidIndexError(key) │ │ ❱ 3812 │ │ │ raise KeyError(key) from err │ │ 3813 │ │ except TypeError: │ │ 3814 │ │ │ # If we have a listlike key, _check_indexing_error will raise │ │ 3815 │ │ │ # InvalidIndexError. Otherwise we fall through and re-raise │ │ │ │ ╭───────────────────────────────────── locals ──────────────────────────────────────╮ │ │ │ casted_key = 'X_mm' │ │ │ │ key = 'X_mm' │ │ │ │ self = Index(['Slide', 'Y_mm', 'Z_mm', 'ZOffset_mm', 'ROI', 'FOV', 'Order', │ │ │ │ │ 'Run_Tissue_name'], │ │ │ │ │ dtype='object') │ │ │ ╰───────────────────────────────────────────────────────────────────────────────────╯ │ ╰──────────────────────────────────────────────────────────────────────────────────────────────────╯ KeyError: 'X_mm' [Tue May 14 15:34:55 2024] Error in rule to_spatialdata: jobid: 4 input: /Users/carlos/projects/sopa_test_1/data_fov99 output: /Users/carlos/projects/sopa_test_1/data_fov99.zarr/.zgroup conda-env: sopa-snake shell:
Shutting down, this might take some time. Exiting because a job execution failed. Look above for error message Complete log: .snakemake/log/2024-05-14T153451.958303.snakemake.log
This is the structure of the folder I use for sending to sopa