ludvigla / semla

Other
47 stars 6 forks source link

Spot coordinates like the STutility? #9

Closed TaopengWang closed 1 year ago

TaopengWang commented 1 year ago

Hi Ludvig,

Thanks for the great package. Looking forward to applying the new functions!

This might be a rare case but when analysing CytAssist Visium data, I noticed there are some rows or columns with particularly high/low Gene/UMI counts (please see the example here Gene_per_spot_spatial_distribution.pdf). My understanding is that this could be some issue with wetlab i.e. probes tend to accumulate more on the edge of the gaskets / the area covered by probes is misaligned with the capture area. So for best practice, I think it would be best to exclude these columns from downstream analysis. And I should mention, filtering using numbers of UMI / gene may not be enough since a low threshold may not exclude all columns and a high threshold may lead to the exclusion of spots with naturally low UMI/gene counts like stromal area.

For the STutility workflow, the spot coordinates are stored in the image@meta.data$x. It's a quite regular pattern and for regular Visium area size, the row ranges from 0 to 127 and the column ranges from 0 to 77.

I noticed the coordinates can also be found in semla in spatial_data@meta_data$pxl_col_in_fullres. But I'm not sure about the exact pattern here.

table(spatial_data@meta_data$pxl_col_in_fullres)

12899 12901 12903 12905 12907 12909 12911 12913 12915 12917 12919 12921 12923 
    1     1     1     1     1     1     1     1     1     1     1     1     1 

looks like the coordinates are unique for each spot.

Maybe I missed it, but I'm just wondering is the feature image@meta.data$x is maintained in semla? if not, I wonder if you'd be keen to add this new feature as it can be quite handy for CytAssist data analysis. Or maybe suggest an alternative method to achieve filtering of the whole column?

Thanks a lot

Tony

ludvigla commented 1 year ago

Hey Tony,

Sorry for the late reply. The pattern of missing data you are seeing is something that we have seen a few times before. Now I can't see the H&E image underneath the spots, but my guess is that it's an artifact caused by a misplacement of the rubber mask during library prep. If the rubber mask is not carefully placed it can cover part of the capture array and block it which will lead to a loss of data in that region.

You could use the FeatureViewer to manually select those spots and then filter them out. That's how I usually do it. If you just plot the nFeature_Spatial in the viewer it's pretty straightforward to select and label the spots.

The x,y array coordinates are not loaded by default when you run ReadVisiumData, but you can access them from the tissue_positions_list.csv files. We provide a utility function (LoadSpatialCoordinates) to load the data from multiple files which can come in handy here.

library(semla)

# Use the same spotfiles as input for ReadVisiumData
# Below are the example spotfiles provided in semla 
spotfiles <-
 Sys.glob(paths = paste0(system.file("extdata", package = "semla"),
                         "/*/spatial/tissue_positions_list.csv"))

# Load coordinates (including x,y array coordinates)
coords <- LoadSpatialCoordinates(spotfiles)

Once you have the coords you can just merge them with you Seurat object meta.data slot.

Cheers, Ludvig

TaopengWang commented 1 year ago

Thanks, Ludvig! I wasn't aware of this function. Should have checked the reference page.