satijalab / seurat

R toolkit for single cell genomics
http://www.satijalab.org/seurat
Other
2.29k stars 917 forks source link

Load Nanostring meta.data doesn't transfer well #8895

Closed roanvanscheppingen closed 4 months ago

roanvanscheppingen commented 6 months ago

When building a Seurat from Cosmx flatfiles, the import options for meta.data is limited and don't work fully as described. Now the meta.data import is very limited compared to the native Seurat returned by the Cosmx machine. But actually, nanostring recommends building the Seurat using flatfiles. Many meta.data columns are inaccessible using LoadNanostring.

Seurat documentation:

metadata | Type of available metadata to read; choose zero or more of:

  • “Area”: number of pixels in cell segmentation
  • “fov”: cell's fov
  • “Mean.MembraneStain”: mean membrane stain intensity
  • “Mean.DAPI”: mean DAPI stain intensity
  • “Mean.G”: mean green channel stain intensity
  • “Mean.Y”: mean yellow channel stain intensity
  • “Mean.R”: mean red channel stain intensity
  • “Max.MembraneStain”: max membrane stain intensity
  • “Max.DAPI”: max DAPI stain intensity
  • “Max.G”: max green channel stain intensity
  • “Max.Y”: max yellow stain intensity
  • “Max.R”: max red stain intensity

The meta.data file contains more, but cannot be fully imported. Bigger bug, defining "Mean.G" (or Y, R) gives Error in [.data.frame(md, , metadata) : undefined columns selected

[1] "nn_59ee731b.4b6e.431c.b051.5ba71eb87378_1_cluster_cluster_64fc893f.482c.409a.a5b3.e29eb0073160_1" [2] "RNA_nbclust_f2d90735.f357.4075.a6a6.1b713a91355e_1_clusters"
[3] "RNA_nbclust_f2d90735.f357.4075.a6a6.1b713a91355e_1_posterior_probability"
[4] "cell"
[5] "nCount_RNA"
[6] "nFeature_RNA"
[7] "nCount_negprobes"
[8] "nFeature_negprobes"
[9] "fov"
[10] "Area"
[11] "AspectRatio"
[12] "CenterX_local_px"
[13] "CenterY_local_px"
[14] "Width"
[15] "Height"
[16] "Mean.Histone"
[17] "Max.Histone"
[18] "Mean.CD68"
[19] "Max.CD68"
[20] "Mean.rRNA"
[21] "Max.rRNA"
[22] "Mean.GFAP"
[23] "Max.GFAP"
[24] "Mean.DAPI"
[25] "Max.DAPI"
[26] "cell_id"
[27] "assay_type"
[28] "version"
[29] "Run_Tissue_name"
[30] "Panel"
[31] "cellSegmentationSetId"
[32] "cellSegmentationSetName"
[33] "slide_ID"
[34] "CenterX_global_px"
[35] "CenterY_global_px"
[36] "cell_ID"
[37] "unassignedTranscripts"
[38] "nCount_falsecode"
[39] "nFeature_falsecode"
[40] "Area.um2"
[41] "propNegative"
[42] "complexity"
[43] "errorCtEstimate"
[44] "percOfDataFromError"
[45] "qcFlagsRNACounts"
[46] "qcFlagsCellCounts"
[47] "qcFlagsCellPropNeg"
[48] "qcFlagsCellComplex"
[49] "qcFlagsCellArea"
[50] "qcCellsFlagged"
[51] "median_negprobes"
[52] "negprobes_quantile_0.9"
[53] "median_RNA"
[54] "RNA_quantile_0.9"
[55] "nCell"
[56] "nCount"
[57] "nCountPerCell"
[58] "nFeaturePerCell"
[59] "propNegativeCellAvg"
[60] "complexityCellAvg"
[61] "errorCtPerCellEstimate"
[62] "percOfDataFromErrorPerCell"
[63] "qcFlagsFOV"

roanvanscheppingen commented 6 months ago

I've identified the piece of code that is causing this. It's in preprocessing.R around line 2000

if (!is.null(metadata)) {
    metadata <- match.arg(
      arg = metadata,
      choices = c(
        "Area", "fov", "Mean.MembraneStain", "Mean.DAPI", "Mean.G",
        "Mean.Y", "Mean.R", "Max.MembraneStain", "Max.DAPI", "Max.G",
        "Max.Y", "Max.R"
      ),
      several.ok = TRUE
    )

Shouldn't choices be something like colnames(metadata.file) so you can have all potential columns as meta.data associated to your Seurat? I'm inexperienced in PRs or changing this to work, but I guess the fix is easy for someone to implement!

longmanz commented 4 months ago

Hi, A workaround is to first load the Count data using LoadNanostring(). Then you can use flatfiles to build a second seurat object and extract the meta.data from it to append it back to the first Seurat object you have with AddMetaData(). We will look into this to see if this is a real bug and fix it if so. Thank you.