neurorestore / Libra

MIT License
153 stars 25 forks source link

input of run_de function when using a Seurat object #37

Open pandaqiuqiu opened 1 year ago

pandaqiuqiu commented 1 year ago

Thanks for your useful tools for scRNA analysis! I just want to know for sure about input of run_de function when using a Seurat object:

Is input raw counts or "RNA" assay when de_family = "singlecell"? Is input only raw counts when de_family = "pseudobulk"? What about "mixedmodel"?

I appreciate it if you answer this question.

jordansquair commented 1 year ago

By default it will use "RNA" as the assay. You can always pass the count matrix in directly.

Yes of course pseudobulk and mixedmodel use counts only.

pandaqiuqiu commented 1 year ago

Make sense! Thanks so much. Another question is, If I set label as input of label_col like this: label = factor(label, levels=c("A", "B")), and then run: run_de(expr, meta, label_col = label). Does the avg_logFC of the result mean A vs B (as reference) or not?

jordansquair commented 1 year ago

The first level of the factor would be used as the reference. so levels = c("A", "B") with a logFC > 1 means B > A.

pandaqiuqiu commented 1 year ago

The first level of the factor would be used as the reference. so levels = c("A", "B") with a logFC > 1 means B > A.

Thanks for pointing out my mistake. I really appreciate that.

pandaqiuqiu commented 1 year ago

I just encountered something confusing. As you have described, I used the following to calculate B vs A (A as reference), with code and result as below.

seurat$TimePoint <- factor(seurat$TimePoint,levels = c("A","B")) DE = run_de(seurat, cell_type_col = "Celltype", replicate_col="Sample", label_col = "TimePoint",
de_family = "pseudobulk",
de_method = "DESeq2",
de_type = "LRT",
n_threads = 4)

image

But when I used the same data as input and same method(deseq2+LRT) following as tutorial (https://github.com/hbctraining/scRNA-seq_online/blob/master/lessons/pseudobulk_DESeq2_scrnaseq.md#scatterplot-of-normalized-expression-of-top-20-most-significant-genes), the logFC result is reverse, the key code and results is like this:

dds$group_id = relevel(dds$group_id, ref = "A") dds <- DESeq(dds, test="LRT", reduced=~1)

image

Can you take a look to see what's causing the problem?

jordansquair commented 1 year ago

I'm not exactly sure from your post what the problem is. You can just plot some of these genes to see if the direction is correct no?

pandaqiuqiu commented 1 year ago

Yes. I used to_pseudobulk() function to get counts at sample-level for plotting A1BG gene. The stat and plot like this: image image

Also, I used plotCounts(dds) function to get normalized data at sample-level for plotting A1BG gene. The stat and plot like this: image image

It seems A1BG is down-regulated in B compared A, so log2FC should be negative in result. What do you think about it?

jordansquair commented 1 year ago

Yes sorry - typo above -- the second factor level is used as reference.

pandaqiuqiu commented 1 year ago

Yes sorry - typo above -- the second factor level is used as reference.

Thanks. Good to have this figured out.

vkartha commented 1 year ago

Follow up to this - Is it always that the second factor level is used as a reference? What if it is a character (and not a factor) and is unspecified, will the run_de function convert to a factor and still use the second one as reference (and will this be alphabetically sorted to determine what is second?)? I find this to be a bit confusing since most tools like DESeq etc use the first level as a reference