waldronlab / VisiumIO

Import spaceranger output and 10X spatial data
0 stars 0 forks source link

Reading "old" Visium 10x data with VisiumIO #6

Closed rcastelo closed 1 month ago

rcastelo commented 1 month ago

Hi Marcel,

Some of the TENxVisiumData package miss the "spatial coordinates" bit of information because this was formerly stored in a file called tissue_positions_list.csv (see here), while the SpatialExperiment constructor read10xVisium() in this line expects a file with the newer name tissue_positions.csv. One such cases is the human cerebellum data here:

https://cf.10xgenomics.com/samples/spatial-exp/1.2.0/Parent_Visium_Human_Cerebellum/Parent_Visium_Human_Cerebellum_filtered_feature_bc_matrix.tar.gz

https://cf.10xgenomics.com/samples/spatial-exp/1.2.0/Parent_Visium_Human_Cerebellum/Parent_Visium_Human_Cerebellum_spatial.tar.gz

Is it possible to build a SpatialExperiment object with VisiumIO that includes the spatial coordinates from the tissue_positions_list.csv file as in the example human cerebellum data?

thanks!

robert.

LiNk-NY commented 1 month ago

Hi Robert! @rcastelo

VisiumIO can handle both tissue positions files with the tissuePattern argument:

Note that the default is tissuePattern = "tissue_positions.*\\.csv".

suppressPackageStartupMessages({
    library(VisiumIO)
})
setwd("~/data/spatial/")
TENxVisium(
    resources =
        "Parent_Visium_Human_Cerebellum_filtered_feature_bc_matrix.tar.gz",
    spatialResource = "Parent_Visium_Human_Cerebellum_spatial.tar.gz"
) |> import()
#> class: SpatialExperiment 
#> dim: 36601 4992 
#> metadata(0):
#> assays(1): counts
#> rownames(36601): ENSG00000243485 ENSG00000237613 ... ENSG00000278817
#>   ENSG00000277196
#> rowData names(3): ID Symbol Type
#> colnames(4992): AAACAACGAATAGTTC-1 AAACAAGTATCTCCCA-1 ...
#>   TTGTTTGTATTACACG-1 TTGTTTGTGTAAATTC-1
#> colData names(4): in_tissue array_row array_col sample_id
#> reducedDimNames(0):
#> mainExpName: Gene Expression
#> altExpNames(0):
#> spatialCoords names(2) : pxl_col_in_fullres pxl_row_in_fullres
#> imgData names(4): sample_id image_id data scaleFactor

Created on 2024-10-04 with reprex v2.1.1

rcastelo commented 1 month ago

Hi again, the parser is reading the spatial pixel columns, but it does not seem to be reading the spatial array columns:

x <- TENxVisium(
    resources =
        "Parent_Visium_Human_Cerebellum_filtered_feature_bc_matrix.tar.gz",
    spatialResource = "Parent_Visium_Human_Cerebellum_spatial.tar.gz"
) |> import()
int_colData(x)
DataFrame with 4992 rows and 4 columns
     reducedDims     altExps    colPairs spatialCoords
     <DataFrame> <DataFrame> <DataFrame>      <matrix>
1                                            3366:1800
2                                            9312:7772
3                                            5227:2153
4                                            3598:8871
5                                            8745:3459
...          ...         ...         ...           ...
4988                                         5182:8746
4989                                         4356:8989
4990                                         4143:7191
4991                                        5119:10544
4992                                         5780:2631

The spatial array columns correspond to the 3rd and 4th columns or the spatial/tissue_positions_list.csv file:

dat <- read.csv("spatial/tissue_positions_list.csv", header=FALSE)
head(dat)
                  V1 V2 V3 V4   V5   V6
1 ACGCCTGACACGCGCT-1  1  0  0 1804 2263
2 TACCGATCCAACACTT-1  1  1  1 1924 2333
3 ATTAAAGCGGACGAGC-1  1  0  2 1804 2401
4 GATAAGGGACGATTAG-1  1  1  3 1923 2470
5 GTGCAAATCACCAATA-1  1  0  4 1803 2539
6 TGTTGGCTGGCGGAAG-1  1  1  5 1923 2608
tail(dat)
                     V1 V2 V3  V4    V5    V6
4987 AGAGTCTTAATGAAAG-1  1 76 122 10883 10701
4988 GAACGTTTGTATCCAC-1  1 77 123 11003 10770
4989 ATTGAATTCCCTGTAG-1  1 76 124 10883 10838
4990 TACCTCACCAATTGTA-1  1 77 125 11003 10908
4991 AGTCGAATTAGCGTAA-1  1 76 126 10882 10976
4992 TTGAAGTGCATCTACA-1  1 77 127 11002 11046
LiNk-NY commented 1 month ago

Hi Robert, @rcastelo Thanks for the info! I have this:

> TENxSpatialCSV("~/data/spatial/tissue_positions_list.csv") |> import()
DataFrame with 4992 rows and 5 columns
                   in_tissue array_row array_col pxl_row_in_fullres pxl_col_in_fullres
                   <integer> <integer> <integer>          <integer>          <integer>
ACGCCTGACACGCGCT-1         1         0         0               1804               2263
TACCGATCCAACACTT-1         1         1         1               1924               2333
ATTAAAGCGGACGAGC-1         1         0         2               1804               2401
GATAAGGGACGATTAG-1         1         1         3               1923               2470
GTGCAAATCACCAATA-1         1         0         4               1803               2539
...                      ...       ...       ...                ...                ...
GAACGTTTGTATCCAC-1         1        77       123              11003              10770
ATTGAATTCCCTGTAG-1         1        76       124              10883              10838
TACCTCACCAATTGTA-1         1        77       125              11003              10908
AGTCGAATTAGCGTAA-1         1        76       126              10882              10976
TTGAAGTGCATCTACA-1         1        77       127              11002              11046

How should the array_row and array_col columns be incorporated into the SpatialExperiment / int_colData? If you have a snippet of code, I can update the import method.

FWIW these columns are in the colData:

> colData(x)
DataFrame with 4992 rows and 4 columns
                   in_tissue array_row array_col   sample_id
                   <integer> <integer> <integer> <character>
AAACAACGAATAGTTC-1         1         0        16    sample01
AAACAAGTATCTCCCA-1         1        50       102    sample01
AAACAATCTACTAGCA-1         1         3        43    sample01
AAACACCAATAACTGC-1         1        59        19    sample01
rcastelo commented 1 month ago

oops, you're right, they land in the colData, somehow I thought they should go as additional columns in the spatialCoords matrix. Thanks for your help!!