stuart-lab / signac

R toolkit for the analysis of single-cell chromatin data
https://stuartlab.org/signac/
Other
317 stars 85 forks source link

Finding features that have the motif #255

Closed bapoorva closed 3 years ago

bapoorva commented 3 years ago

Hi,

First, thanks for signac. Great package for scATAC analysis.

What I am attempting is the reverse of what is in the documentation. Instead of finding overrepresented motifs/TF in features, I want to know which features have the motifs I am looking for and in what percentage. So I want to use the function FindMotifs to get that nice table with the % expressed. What is the best way to do that ?

Thank, Apoorva

timoast commented 3 years ago

Hi Apoorva, the motifmatchr package can generate a feature x motif matrix where each entry indicates the presence/absence of the motif in that feature (this is also what we use to do the motif enrichment test). You can either run motifmatchr or use the CreateMotifMatrix() function in Signac (which is just a convenient wrapper for functions in motifmatchr) to generate the matrix

bapoorva commented 3 years ago

Thank you very much . I created a motif matrix and tried two things

  1. Run the peaks of interest (using first 100 as an example) through FindMotifs to get the percentage
  2. convert the motif id to motif name
>pfm <- getMatrixSet(
  x = JASPAR2018,
  opts = list(collection='CORE',all_versions = FALSE,tax_group='vertebrates')
)

>motif.matrix <- CreateMotifMatrix(
  features = granges(atac),
  pwm = pfm,
  genome = BSgenome.Mmusculus.UCSC.mm10
)

>mtx= as.data.frame(as.matrix(motif.matrix))
>motif.enriched <- FindMotifs(object = atac, features =rownames(mtx1)[1:100], assay ="peaks")
Selecting background regions to match input
              sequence characteristics
Matching GC.percent distribution
Error in density.default(x = mf.query[[i]], kernel = "gaussian", bw = 1) : 
  argument 'x' must be numeric

>motif_name= ConvertMotifID(object = atac, id= colnames(mtx))

The FindMotifs function gave me the error above and the ConvertMotifId returned a matrix of per cell motif activity score with motif id's instead of name. which brings me to the following questions

  1. How do i fix that error ? (I checked the assay. It is a chromatin assay )
  2. In the documentation, the table with enriched motifs has percent.observed. Is that the percent observed in the overall data or in the ident being tested ?
  3. Any reason why ConvertMotifID isn't working ?

Thanks, Apoorva

timoast commented 3 years ago

Not sure why you're seeing that error, but if I understand correctly you have a set of peaks and you want to find what percentage of those peaks contain a certain motif? If so, you don't need to use the FindMotifs function.

If you generate the motif matrix, you can then compute the percentage of peaks containing a certain motif from the matrix directly. For example:

library(Signac)
library(JASPAR2020)
library(TFBSTools)

# example object
obj <- readRDS("./vignette_data/pbmc.rds")

# Get a list of motif position frequency matrices from the JASPAR database
pfm <- getMatrixSet(
  x = JASPAR2020,
  opts = list(species = 9606, all_versions = FALSE)
)

# Scan the DNA sequence of each peak for the presence of each motif
motif.matrix <- CreateMotifMatrix(
  features = granges(obj),
  pwm = pfm,
  genome = 'hg19',
  use.counts = FALSE
)

# example peak set we're interested in
peaks.use <- head(rownames(obj), 100)
motif.use <- colnames(motif.matrix)[1]

# compute fraction of peaks containing a certain motif
sum(motif.matrix[peaks.use, motif.use]) / length(peaks.use)

In the documentation, the table with enriched motifs has percent.observed. Is that the percent observed in the overall data or in the ident being tested ?

Yes, percent.observed is the percentage of input peaks (ie, supplied by the features parameter in FindMotifs()) that contain the motif. percent.background is the percentage of background peaks that contained the motif.

erlun1 commented 2 years ago

Thank you very much . I created a motif matrix and tried two things

  1. Run the peaks of interest (using first 100 as an example) through FindMotifs to get the percentage
  2. convert the motif id to motif name
>pfm <- getMatrixSet(
  x = JASPAR2018,
  opts = list(collection='CORE',all_versions = FALSE,tax_group='vertebrates')
)

>motif.matrix <- CreateMotifMatrix(
  features = granges(atac),
  pwm = pfm,
  genome = BSgenome.Mmusculus.UCSC.mm10
)

>mtx= as.data.frame(as.matrix(motif.matrix))
>motif.enriched <- FindMotifs(object = atac, features =rownames(mtx1)[1:100], assay ="peaks")
Selecting background regions to match input
              sequence characteristics
Matching GC.percent distribution
Error in density.default(x = mf.query[[i]], kernel = "gaussian", bw = 1) : 
  argument 'x' must be numeric

>motif_name= ConvertMotifID(object = atac, id= colnames(mtx))

The FindMotifs function gave me the error above and the ConvertMotifId returned a matrix of per cell motif activity score with motif id's instead of name. which brings me to the following questions

  1. How do i fix that error ? (I checked the assay. It is a chromatin assay )
  2. In the documentation, the table with enriched motifs has percent.observed. Is that the percent observed in the overall data or in the ident being tested ?
  3. Any reason why ConvertMotifID isn't working ?

Thanks, Apoorva

About your question3, when you set the defultassay as "chromvar", which stores the zscore assay of motifs,you will get a matrix of per cell motif activity score. It is because that the annotation of the motifs stores in your ATAC assay. For my rds,it is stored in seuset@assays$peaks@motifs;so change your defultassay can be helpful.