stuart-lab / signac

R toolkit for the analysis of single-cell chromatin data
https://stuartlab.org/signac/
Other
323 stars 87 forks source link

Error in MatchRegionStats function of motif analysis #526

Closed strawberry789 closed 3 years ago

strawberry789 commented 3 years ago

I'm trying to find motifs for every cluster of my object. I was successful for the first cluster, but am not able to find motifs for the subsequent clusters. Why is this the case?

I'm following the motif vignette: https://satijalab.org/signac/articles/motif_vignette.html

motif_peaks <- FindAllMarkers( object = object, logfc.threshold = 0.1, #default 0.25 only.pos = TRUE,

min.pct = 0.1,

test.use = 'LR', latent.vars = 'nCount_peaks' )

Then I subset the results to cluster 0: motif_peaks.0 <- motif_peaks[motif_peaks$cluster == 0,]

I get the top differentially accessible peaks: top_motif_peak.0 <- rownames(motif_peaks.0[motif_peaks.0$p_val < 0.005, ])

There are 6 clusters of cells and I proceed with choosing the set of background peaks: open.peaks <- AccessiblePeaks(object, idents = c("0", "1", "2", "3", "4", "5"))

meta.feature <- GetAssayData(object, assay = "peaks", slot = "meta.features") peaks.matched.0 <- MatchRegionStats( meta.feature = meta.feature[open.peaks, ], query.feature = meta.feature[top_motif_peak.0, ], n = 50000 )

Test enrichment: enriched.motifs.0 <- FindMotifs( object = object, features = top_motif_peak.0, background = peaks.matched.0 ) And I get good results. But when I repeat changing the relevant zeros to ones (for cluster 1), I get this error: Error in density.default(x = query.feature[[featmatch]], kernel = "gaussian", : 'x' contains missing values Traceback:

  1. MatchRegionStats(meta.feature = meta.feature[mdb12N1.open.peaks, . ], query.feature = meta.feature[mdb12N1.top_motif_peak.1, . ], n = 50000)
  2. density(x = query.feature[[featmatch]], kernel = "gaussian", . bw = 1)
  3. density.default(x = query.feature[[featmatch]], kernel = "gaussian", . bw = 1)
  4. stop("'x' contains missing values")

Thank you!

timoast commented 3 years ago

How many peaks are present in top_motif_peak.1 and peaks.matched.1?

strawberry789 commented 3 years ago

1964 peaks are present in top_motif_peak.1

Sorry, I should have specified more clearly. I got the error when I tried to obtain the peaks.matched.1.

peaks.matched.1 <- MatchRegionStats( meta.feature = meta.feature[open.peaks, ], query.feature = meta.feature[top_motif_peak.1, ], n = 50000 )

Error in density.default(x = query.feature[[featmatch]], kernel = "gaussian", : 'x' contains missing values Traceback:

  1. MatchRegionStats(meta.feature = meta.feature[mdb12N1.open.peaks, . ], query.feature = meta.feature[mdb12N1.top_motif_peak.1, . ], n = 50000)
  2. density(x = query.feature[[featmatch]], kernel = "gaussian", . bw = 1)
  3. density.default(x = query.feature[[featmatch]], kernel = "gaussian", . bw = 1)
  4. stop("'x' contains missing values")
strawberry789 commented 3 years ago

FYI: there are 128361 peaks in open.peaks

timoast commented 3 years ago

Do you have any NA values in the meta.feature dataframe?

Also, can you please format your code comments so that that they are readable: https://docs.github.com/en/enterprise-server@2.20/github/writing-on-github/basic-writing-and-formatting-syntax#quoting-code

strawberry789 commented 3 years ago

Thanks for the code comments link.

There are no NA values in the meta.feature dataframe.

timoast commented 3 years ago

Hi @strawberry789, I'm still not sure what the issue is here. If you could email the data required to reproduce this (the meta.feature dataframe, open.peaks, and top_motif_peak.1) to tstuart@nygenome.org I can try to take a look.

strawberry789 commented 3 years ago

Hi @timoast, I just sent you an email. Thanks so much.

timoast commented 3 years ago

I found the issue is that you have some peaks in top_motif_peak.1 that are not in meta.feature:

> setdiff(top_motif_peak.1, rownames(meta.feature))
[1] "chr12-8385086-83874681"     "chr14-122056314-1220585961" "chr14-118635017-1186363101"
[4] "chr11-86618121-866193501"   "chr3-50401889-504033821"    "chr14-70882472-708847181"

Did you do anything unusual when you generated these objects? For example, subset the features in the Seurat object after finding the differential peaks?

Or did you perhaps get the meta.features data from a different assay from that used to find the differential peaks?