sq.read.vizgen is not compatible with the new vizgen firmware 232 cell_metadata.csv format

scverse / squidpy

Spatial Single Cell Analysis in Python

https://squidpy.readthedocs.io/en/stable/

BSD 3-Clause "New" or "Revised" License

440 stars 81 forks source link

sq.read.vizgen is not compatible with the new vizgen firmware 232 cell_metadata.csv format #657

Closed pakiessling closed 1 year ago

pakiessling commented 1 year ago

Hi, there.

cell_metadata.csv now has a lot more columns that contain information about protein stainings.

Example columns from my latest measurement:

EntityID,fov,volume,center_x,center_y,min_x,min_y,max_x,max_y,anisotropy,transcript_count,perimeter_area_ratio, solidity,Cellbound2_raw,Cellbound2_high_pass,protein1_raw,protein1_high_pass,Cellbound3_raw,Cellbound3_high_pass, PolyT_raw,PolyT_high_pass,Cellbound1_raw,Cellbound1_high_pass,protein2_raw,protein2_high_pass,DAPI_raw, DAPI_high_pass,protein3_raw,protein3_high_pass

sq.read.vizgen cannot handle this and fails with Length mismatch: Expected axis has 28 elements, new values have 8 elements

michalk8 commented 1 year ago

Hi @pakiessling , this was fixed in #648, you will need to install squidpy from main.

pakiessling commented 1 year ago

@michalk8 Ah perfect, thank you

pakiessling commented 1 year ago

@michalk8 Hi, I cloned the main branch and reinstalled. Now I get a bunch of NaN when importing via read.vizgen. Loading via pandas and than assigning to .obs works fine. Am I doing something wrong?

grafik

michalk8 commented 1 year ago

@michalk8 Hi, I cloned the main branch and reinstalled. Now I get a bunch of NaN when importing via read.vizgen.

In Squidpy, we add the observations as:

adata, library_id = _read_counts(  in squidpy.read._utils._read_counts
    path=path, count_file=counts_file, library_id=library_id,
    delimiter=",", first_column_names=True, **kwargs
)
...
coords = pd.read_csv(path / meta_file, header=0, index_col=0)
adata.obs = pd.merge(adata.obs, coords, how="left", left_index=True, right_index=True)

I suspect the adata.obs and coords have different index and that's why you see the NaN.

Based on how you load it with pandas, it seems like the index_col=0 would correctly use EntityID as index. Are all values NaN? I was thinking maybe there are some extra values in the counts file not present in the meta file. Also could you please share a small code snippet how you load the data?

pakiessling commented 1 year ago

@michalk8 Hi, I figured out what is going wrong.

_read_counts is reading in the adata.obs index as dtype="object" while pd.read_csv(path / meta_file, header=0, index_col=0) is reading the index as dtype='int64'. Converting the index to the same type fixes the merge step - no more NaN

michalk8 commented 1 year ago

@michalk8 Hi, I figured out what is going wrong.

_read_counts is reading in the adata.obs index as dtype="object" while pd.read_csv(path / meta_file, header=0, index_col=0) is reading the index as dtype='int64'. Converting the index to the same type fixes the merge step - no more NaN

Can confirm locally this causes the issue, will be fixed in #665 .