Open pandaqiuqiu opened 1 year ago
By default it will use "RNA" as the assay. You can always pass the count matrix in directly.
Yes of course pseudobulk and mixedmodel use counts only.
Make sense! Thanks so much. Another question is, If I set label as input of label_col like this: label = factor(label, levels=c("A", "B")), and then run: run_de(expr, meta, label_col = label). Does the avg_logFC of the result mean A vs B (as reference) or not?
The first level of the factor would be used as the reference. so levels = c("A", "B") with a logFC > 1 means B > A.
The first level of the factor would be used as the reference. so levels = c("A", "B") with a logFC > 1 means B > A.
Thanks for pointing out my mistake. I really appreciate that.
I just encountered something confusing. As you have described, I used the following to calculate B vs A (A as reference), with code and result as below.
seurat$TimePoint <- factor(seurat$TimePoint,levels = c("A","B"))
DE = run_de(seurat,
cell_type_col = "Celltype",
replicate_col="Sample",
label_col = "TimePoint",
de_family = "pseudobulk",
de_method = "DESeq2",
de_type = "LRT",
n_threads = 4)
But when I used the same data as input and same method(deseq2+LRT) following as tutorial (https://github.com/hbctraining/scRNA-seq_online/blob/master/lessons/pseudobulk_DESeq2_scrnaseq.md#scatterplot-of-normalized-expression-of-top-20-most-significant-genes), the logFC result is reverse, the key code and results is like this:
dds$group_id = relevel(dds$group_id, ref = "A") dds <- DESeq(dds, test="LRT", reduced=~1)
Can you take a look to see what's causing the problem?
I'm not exactly sure from your post what the problem is. You can just plot some of these genes to see if the direction is correct no?
Yes. I used to_pseudobulk() function to get counts at sample-level for plotting A1BG gene. The stat and plot like this:
Also, I used plotCounts(dds) function to get normalized data at sample-level for plotting A1BG gene. The stat and plot like this:
It seems A1BG is down-regulated in B compared A, so log2FC should be negative in result. What do you think about it?
Yes sorry - typo above -- the second factor level is used as reference.
Yes sorry - typo above -- the second factor level is used as reference.
Thanks. Good to have this figured out.
Follow up to this - Is it always that the second factor level is used as a reference? What if it is a character (and not a factor) and is unspecified, will the run_de function convert to a factor and still use the second one as reference (and will this be alphabetically sorted to determine what is second?)? I find this to be a bit confusing since most tools like DESeq etc use the first level as a reference
Thanks for your useful tools for scRNA analysis! I just want to know for sure about input of run_de function when using a Seurat object:
Is input raw counts or "RNA" assay when de_family = "singlecell"? Is input only raw counts when de_family = "pseudobulk"? What about "mixedmodel"?
I appreciate it if you answer this question.