satijalab / seurat

R toolkit for single cell genomics
http://www.satijalab.org/seurat
Other
2.25k stars 903 forks source link

Missing values error message after SCTransform? #4546

Closed Pancreas-Pratik closed 3 years ago

Pancreas-Pratik commented 3 years ago

Link to publicly available dataset: https://cells.ucsc.edu/?ds=human-pancreas-dev

Direct Expression Matrix Link: https://cells.ucsc.edu/human-pancreas-dev/fetal-pancreas/exprMatrix.tsv.gz

Direct Meta Data Link: https://cells.ucsc.edu/human-pancreas-dev/fetal-pancreas/meta.tsv

I receive this error after running SCTransform on this dataset.

scfp3 <- CreateSeuratObject(counts = mat, project = "scfp7to10wpc", meta.data=meta)
scfp3[["percent.mt"]] <- PercentageFeatureSet(scfp3, pattern = "^MT")
scfp3 <- SCTransform(scfp3, method = "glmGamPoi", vars.to.regress = "percent.mt", verbose = FALSE)

Warning in vst(method = "glmGamPoi", umi = new("dgCMatrix", i = c(0L, 2L,  :
  NaNs produced
Error in density.default(x = genes_log_gmean_step1, bw = "nrd", adjust = 1) : 
  'x' contains missing values

I have checked the following:

> any(is.na(mat))
[1] FALSE
> any(is.nan(as.matrix(mat)))
[1] FALSE
> any(is.null(as.matrix(mat)))
[1] FALSE

This is everything I have done after downloading the dataset. SCTransform works on other datasets I am using,

#Reading Expression Matrix
mat <- read.table("~/Projects/fetal-pancreas-7-10wpc/exprMatrix.tsv", header=T, sep="\t")

#Converting ENSEMBL gene names to HGNC
library(biomaRt)
mart <- useMart('ENSEMBL_MART_ENSEMBL')
mart <- useDataset('hsapiens_gene_ensembl', mart)
annotLookup <- getBM(
  mart = mart,
  attributes = c(
    'hgnc_symbol',
    'ensembl_gene_id'),
  uniqueRows = TRUE)
colnames(annotLookup) <- c("gene", "ensembl")
colnames(mat)[1] <- c("ensembl")
mat <- merge(annotLookup, mat, by = "ensembl")

#removing ENSEMBL gene rows with no corresponding HGNC symbol
mat <- mat[!(is.na(mat$gene) | mat$gene==""), ]

#removing duplicate HGNC gene reads
mat = mat[!duplicated(mat$gene),]

#fixing up matrix
rownames(mat) <- mat$gene
mat$ensembl <- NULL
mat$gene <- NULL

#cleaning environment
rm("annotLookup", "mart")

#removing all genes with no reads in any cells.
mat <- mat[rowSums(mat[, -1])>0, ]
#removing cells with zero counts
mat <- mat[, colSums(mat == 0) != nrow(mat)] 

any(is.na(mat))
any(is.nan(as.matrix(mat)))
any(is.null(as.matrix(mat)))
Registered S3 methods overwritten by 'dbplyr':
  method         from
  print.tbl_lazy     
  print.tbl_sql      
[1] FALSE
[1] FALSE
[1] FALSE
################
#Loading Seurat#
################
library(dplyr)
library(Seurat)
library(patchwork)
library(ggplot2)
library(sctransform)
library("glmGamPoi")

meta <- read.table("~/Projects/fetal-pancreas-7-10wpc/meta.tsv", header=T, sep="\t", as.is=T, row.names=1)

scfp3 <- CreateSeuratObject(counts = mat, project = "scfp7to10wpc", meta.data=meta)
scfp3[["percent.mt"]] <- PercentageFeatureSet(scfp3, pattern = "^MT")
scfp3 <- SCTransform(scfp3, method = "glmGamPoi", vars.to.regress = "percent.mt", verbose = FALSE)
Warning in vst(method = "glmGamPoi", umi = new("dgCMatrix", i = c(0L, 2L,  :
  NaNs produced
Error in density.default(x = genes_log_gmean_step1, bw = "nrd", adjust = 1) : 
  'x' contains missing values

Could someone help out with this, please? (Hope I made a good reproducible example. Forgive me if I haven't. Please let me know if you need anything else)

saketkc commented 3 years ago

Thanks for the reproducible example @Pancreas-Pratik, it seems the input has already been normalized (log-transformed).

> mat[1:5, 1:5]
                W2108         W3605        W3629        W3723         W365
ENSG00000000003     0  1.5286192959  0.139595909  1.271364144  1.316320848
ENSG00000000005     0 -0.0256210019  0.000000000  0.000000000 -0.005159296
ENSG00000000419     0  1.0344888395 -0.311763211 -0.085970702  0.458058049
ENSG00000000457     0  0.0008419708  0.000000000  0.004414319  0.086613988
ENSG00000000460     0  0.0095650832  0.005143084  0.008371999  0.025515359
> gene_mean <- rowMeans(mat)
> gene_var <- rowVars(as.matrix(mat_exp))
> qplot(gene_mean, gene_var)

image

SCT requires counts data. There are ways to recover counts from this data, but it would probably be best to get hold of the counts if you can. Feel free to reopen in case you have any follow up questions.