Closed bapoorva closed 3 years ago
Hi Apoorva, the motifmatchr package can generate a feature x motif matrix where each entry indicates the presence/absence of the motif in that feature (this is also what we use to do the motif enrichment test). You can either run motifmatchr
or use the CreateMotifMatrix()
function in Signac (which is just a convenient wrapper for functions in motifmatchr
) to generate the matrix
Thank you very much . I created a motif matrix and tried two things
>pfm <- getMatrixSet(
x = JASPAR2018,
opts = list(collection='CORE',all_versions = FALSE,tax_group='vertebrates')
)
>motif.matrix <- CreateMotifMatrix(
features = granges(atac),
pwm = pfm,
genome = BSgenome.Mmusculus.UCSC.mm10
)
>mtx= as.data.frame(as.matrix(motif.matrix))
>motif.enriched <- FindMotifs(object = atac, features =rownames(mtx1)[1:100], assay ="peaks")
Selecting background regions to match input
sequence characteristics
Matching GC.percent distribution
Error in density.default(x = mf.query[[i]], kernel = "gaussian", bw = 1) :
argument 'x' must be numeric
>motif_name= ConvertMotifID(object = atac, id= colnames(mtx))
The FindMotifs function gave me the error above and the ConvertMotifId returned a matrix of per cell motif activity score with motif id's instead of name. which brings me to the following questions
Thanks, Apoorva
Not sure why you're seeing that error, but if I understand correctly you have a set of peaks and you want to find what percentage of those peaks contain a certain motif? If so, you don't need to use the FindMotifs function.
If you generate the motif matrix, you can then compute the percentage of peaks containing a certain motif from the matrix directly. For example:
library(Signac)
library(JASPAR2020)
library(TFBSTools)
# example object
obj <- readRDS("./vignette_data/pbmc.rds")
# Get a list of motif position frequency matrices from the JASPAR database
pfm <- getMatrixSet(
x = JASPAR2020,
opts = list(species = 9606, all_versions = FALSE)
)
# Scan the DNA sequence of each peak for the presence of each motif
motif.matrix <- CreateMotifMatrix(
features = granges(obj),
pwm = pfm,
genome = 'hg19',
use.counts = FALSE
)
# example peak set we're interested in
peaks.use <- head(rownames(obj), 100)
motif.use <- colnames(motif.matrix)[1]
# compute fraction of peaks containing a certain motif
sum(motif.matrix[peaks.use, motif.use]) / length(peaks.use)
In the documentation, the table with enriched motifs has percent.observed. Is that the percent observed in the overall data or in the ident being tested ?
Yes, percent.observed
is the percentage of input peaks (ie, supplied by the features
parameter in FindMotifs()
) that contain the motif. percent.background
is the percentage of background peaks that contained the motif.
Thank you very much . I created a motif matrix and tried two things
- Run the peaks of interest (using first 100 as an example) through FindMotifs to get the percentage
- convert the motif id to motif name
>pfm <- getMatrixSet( x = JASPAR2018, opts = list(collection='CORE',all_versions = FALSE,tax_group='vertebrates') ) >motif.matrix <- CreateMotifMatrix( features = granges(atac), pwm = pfm, genome = BSgenome.Mmusculus.UCSC.mm10 ) >mtx= as.data.frame(as.matrix(motif.matrix)) >motif.enriched <- FindMotifs(object = atac, features =rownames(mtx1)[1:100], assay ="peaks") Selecting background regions to match input sequence characteristics Matching GC.percent distribution Error in density.default(x = mf.query[[i]], kernel = "gaussian", bw = 1) : argument 'x' must be numeric >motif_name= ConvertMotifID(object = atac, id= colnames(mtx))
The FindMotifs function gave me the error above and the ConvertMotifId returned a matrix of per cell motif activity score with motif id's instead of name. which brings me to the following questions
- How do i fix that error ? (I checked the assay. It is a chromatin assay )
- In the documentation, the table with enriched motifs has percent.observed. Is that the percent observed in the overall data or in the ident being tested ?
- Any reason why ConvertMotifID isn't working ?
Thanks, Apoorva
About your question3, when you set the defultassay as "chromvar", which stores the zscore assay of motifs,you will get a matrix of per cell motif activity score. It is because that the annotation of the motifs stores in your ATAC assay. For my rds,it is stored in seuset@assays$peaks@motifs;so change your defultassay can be helpful.
Hi,
First, thanks for signac. Great package for scATAC analysis.
What I am attempting is the reverse of what is in the documentation. Instead of finding overrepresented motifs/TF in features, I want to know which features have the motifs I am looking for and in what percentage. So I want to use the function FindMotifs to get that nice table with the % expressed. What is the best way to do that ?
Thank, Apoorva