scmethods / scregclust

https://scmethods.github.io/scregclust/
GNU General Public License v3.0
9 stars 2 forks source link

Inquiry on Using Mouse Gene Data with scregclust for Regulatory Module Analysis #2

Open maaa-a opened 1 week ago

maaa-a commented 1 week ago

Hello! I am using the scregclust package to analyze regulatory modules in single-cell RNA-seq data from mice, but I encountered an issue. Since scregclust_format defaults to using human-specific lists for transcription factors (TFs) or kinases, I cannot identify any regulators in my mouse data, which leaves the is_regulator vector empty.

Could you recommend a method or alternative regulatory gene list suitable for mouse data? Alternatively, is there a way to customize the transcription factor list to use scregclust with mouse-specific regulatory information?

Thank you very much for your time and for developing this powerful tool. I look forward to any advice or recommendations you might have.

cyianor commented 1 week ago

In our work we only considered human scRNA-seq data and the builtin regulator lists are biased towards that. You are always free to supply your own list. The is_regulator input is simply a logical vector or a 0/1 vector where 1/TRUE indicates that a gene is a tentative regulator and 0/FALSE is a target gene. So if you have a custom list of TFs or kinases or other regulators you'd like to consider, you can simply create this vector yourself and supply it to the call to scregclust.

idacharlottalarsson commented 1 week ago

Hi! Just adding to Felix's answer. Using your own list of mouse-specific transcription factors to create the is_regulator vector is the best way to go and I've done this previously using the list provided here (https://resources.aertslab.org/cistarget/tf_lists/). The code would look something like this:

out <- scregclust_format(z, mode = "TF") #z is your gene expression matrix (genes x cells)
genesymbols <- out$genesymbols 
sample_assignment <- out$sample_assignment
is_regulator <- out$is_regulator #this will be only 0s

ix<-which(genesymbols %in% mouseTF) #mouseTF is the list of mouse transcription factors above
is_predictor[ix]<-1

Then you should be able to continue as usual :)