stuart-lab / signac

R toolkit for the analysis of single-cell chromatin data
https://stuartlab.org/signac/
Other
316 stars 85 forks source link

Using the ATAC peak matrix in place of masc2 calling peak matrix occurred error #486

Closed honghh2018 closed 3 years ago

honghh2018 commented 3 years ago

Hi @timoast The other step can run successfully, and the DimPlot had be drawed already. however, when runing on RegionStats step get below error. I guess that the genome i used had the scaffold debris, in my case, the KI270727.1 chr liken the debris.

gex <- RegionStats(gex, genome = BSgenome.Hsapiens.UCSC.hg38) Error in .getOneSeqFromBSgenomeMultipleSequences(x, name, start, NA, width, : sequence KI270727.1 not found Under this condition, how can i fix it? Best, hanhuihong

timoast commented 3 years ago

Please include the full code you're running

honghh2018 commented 3 years ago

Hi The full code lying below:

data1<-Read10X('/share/nas1/Data/PipeLine/scRNA_ATAC_Combined/v1.0/analysis/result/sample345/outs/filtered_feature_bc_matrix')
data1$Peaks
library(EnsDb.Hsapiens.v86)
library(BSgenome.Hsapiens.UCSC.hg38)

fragpath <- "/share/nas1/Data/PipeLine/scRNA_ATAC_Combined/v1.0/analysis/result/sample345/outs/atac_fragments.tsv.gz"

annotation <- GetGRangesFromEnsDb(ensdb = EnsDb.Hsapiens.v86)
seqlevelsStyle(annotation) <- "UCSC"
genome(annotation) <- "hg38"
gex<-CreateSeuratObject(
  counts = data1$`Gene Expression`,
  assay = "RNA"
)

gex[["ATAC"]] <- CreateChromatinAssay(
  counts = data1$Peaks,
  sep = c(":", "-"),
  fragments = fragpath,
  annotation = annotation
)
###control qc
DefaultAssay(gex) <- "ATAC"

gex <- NucleosomeSignal(gex)
gex <- TSSEnrichment(gex)

VlnPlot(
  object = gex,
  features = c("nCount_RNA", "nCount_ATAC", "TSS.enrichment", "nucleosome_signal"),
  ncol = 4,
  pt.size = 0
)
##QC
gex <- subset(
  x = gex,
  subset = nCount_ATAC < 100000 &
    nCount_RNA < 25000 &
    nCount_ATAC > 1000 &
    nCount_RNA > 1000 &
    nucleosome_signal < 2 &
    TSS.enrichment > 1
)

###############
DefaultAssay(gex) <- "RNA"

gex <- SCTransform(gex)
gex <- RunPCA(gex)

##############
DefaultAssay(gex) <- "ATAC"
gex <- FindTopFeatures(gex, min.cutoff = 5)
gex <- RunTFIDF(gex)
gex <- RunSVD(gex)
DefaultAssay(gex) <- "SCT"
gex[['pca']]
###joint analysis

gex <- FindMultiModalNeighbors(
  object = gex,
  reduction.list = list("pca", "lsi"), 
  dims.list = list(1:50, 2:40),
  modality.weight.name = "RNA.weight",
  verbose = TRUE
)

# To use Python UMAP via reticulate, set umap.method to 'umap-learn' and metric to 'correlation'
gex <- RunUMAP(
  object = gex,
  nn.name = "weighted.nn", 
  assay = "RNA",
  verbose = TRUE
)
timoast commented 3 years ago

At what point are you calling RegionStats()? Please include the full code

honghh2018 commented 3 years ago

sorry to lately reply @timoast, The full code i run post below: library(Seurat) library(Signac) library(reticulate) library(ggplot2) py_config() py_available('umap') py_available('numpy') sys<-import('sys')

data1<-Read10X('/scRNA_ATAC_Combined/v1.0/analysis/result/sample345/outs/filtered_feature_bc_matrix') data1$Peaks library(EnsDb.Hsapiens.v86) library(BSgenome.Hsapiens.UCSC.hg38)

fragpath <- "/scRNA_ATAC_Combined/v1.0/analysis/result/sample345/outs/atac_fragments.tsv.gz"

annotation <- GetGRangesFromEnsDb(ensdb = EnsDb.Hsapiens.v86) seqlevelsStyle(annotation) <- "UCSC" genome(annotation) <- "hg38" gex<-CreateSeuratObject( counts = data1$Gene Expression, assay = "RNA" )

gex[["ATAC"]] <- CreateChromatinAssay( counts = data1$Peaks, sep = c(":", "-"), fragments = fragpath, annotation = annotation )

control qc

DefaultAssay(gex) <- "ATAC"

gex <- NucleosomeSignal(gex) gex <- TSSEnrichment(gex)

VlnPlot( object = gex, features = c("nCount_RNA", "nCount_ATAC", "TSS.enrichment", "nucleosome_signal"), ncol = 4, pt.size = 0 )

QC

gex <- subset( x = gex, subset = nCount_ATAC < 100000 & nCount_RNA < 25000 & nCount_ATAC > 1000 & nCount_RNA > 1000 & nucleosome_signal < 2 & TSS.enrichment > 1 )

############## DefaultAssay(gex) <- "RNA"

gex <- SCTransform(gex) gex <- RunPCA(gex)

############## DefaultAssay(gex) <- "ATAC" gex <- FindTopFeatures(gex, min.cutoff = 5) gex <- RunTFIDF(gex) gex <- RunSVD(gex) DefaultAssay(gex) <- "SCT" gex[['pca']]

joint analysis

gex <- FindMultiModalNeighbors( object = gex, reduction.list = list("pca", "lsi"), dims.list = list(1:50, 2:40), modality.weight.name = "RNA.weight", verbose = TRUE )

To use Python UMAP via reticulate, set umap.method to 'umap-learn' and metric to 'correlation'

gex <- RunUMAP( object = gex, nn.name = "weighted.nn", assay = "RNA", verbose = TRUE )

DimPlot(gex, label = TRUE, repel = TRUE, reduction = "umap") gex<-FindNeighbors(gex,dims = 1:30) gex<-FindClusters(gex,resolution = 0.3)

Linking peaks to genes

DefaultAssay(gex) <- "ATAC"

first compute the GC content for each peak

gex <- RegionStats(gex, genome = BSgenome.Hsapiens.UCSC.hg38) ###the step showing error that i post on first floor. Best, hanhuihong

honghh2018 commented 3 years ago

Could any advices for this solution ?

timoast commented 3 years ago

can you show the output of head(rownames(gex[["ATAC"]]))?

honghh2018 commented 3 years ago

@timoast the result showing below: image

timoast commented 3 years ago

I think the issue is that the genome you mapped the data to has the scaffolds named differently to the BSgenome. The scaffolds aren't that useful for analysis, so you could solve this by removing peaks that are on scaffolds. For example:

library(BSgenome.Hsapiens.UCSC.hg38)

main.chroms <- standardChromosomes(BSgenome.Hsapiens.UCSC.hg38)
keep.peaks <- as.logical(seqnames(granges(object)) %in% main.chroms)
object <- object[keep.peaks, ]
honghh2018 commented 3 years ago

Hi @timoast , It was weird when I follow your code that post above. the object should be had the RNA assay and the ATAC assay, but when i run the object <- object[keep.peaks, ] that you post, the object assay just only had the ATAC assay, without the RNA data. if i try another way to remove the scaffolds like below, it occurred error: image error: image On my case, i want to use the cellranger ATAC data and avoid the scaffold that can trigger the error and make the analysis pause, based on this ,i hope that function can be include within on the Signac package. The full code i take lying below: data1<-Read10X('/share/nas1/Data/PipeLine/scRNA_ATAC_Combined/v1.0/analysis/result/sample345/outs/filtered_feature_bc_matrix') data1$Peaks library(EnsDb.Hsapiens.v86) library(BSgenome.Hsapiens.UCSC.hg38)

BSgenome.Hsapiens.UCSC.hg38

library(BSgenome.Hsapiens.UCSC.hg38)

fragpath <- "/share/nas1/Data/PipeLine/scRNA_ATAC_Combined/v1.0/analysis/result/sample345/outs/atac_fragments.tsv.gz"

annotation <- GetGRangesFromEnsDb(ensdb = EnsDb.Hsapiens.v86) seqlevelsStyle(annotation) <- "UCSC" genome(annotation) <- "hg38" gex<-CreateSeuratObject( counts = data1$Gene Expression, assay = "RNA" )

gex[["ATAC"]] <- CreateChromatinAssay( counts = data1$Peaks, sep = c(":", "-"), fragments = fragpath, annotation = annotation )

control qc

DefaultAssay(gex) <- "ATAC" main.chroms <- standardChromosomes(BSgenome.Hsapiens.UCSC.hg38) keep.peaks <- as.logical(seqnames(granges(gex)) %in% main.chroms) gex[['ATAC']] <- gex[keep.peaks, ]

Hope the update. Best, hanhuihong

honghh2018 commented 3 years ago

Have anyone could help?

timoast commented 3 years ago

To subset only the ATAC assay use:

library(BSgenome.Hsapiens.UCSC.hg38)

main.chroms <- standardChromosomes(BSgenome.Hsapiens.UCSC.hg38)
keep.peaks <- which(as.character(seqnames(granges(object))) %in% main.chroms)
object[["ATAC"]] <- subset(object[["ATAC"]], features = rownames(object[["ATAC"]])[keep.peaks])