stuart-lab / signac

R toolkit for the analysis of single-cell chromatin data
https://stuartlab.org/signac/
Other
327 stars 88 forks source link

Suspicious fold change results from FindMarkers() #756

Closed danielcgingerich closed 3 years ago

danielcgingerich commented 3 years ago

12 Disease + 12 control samples were integrated and clustered. In each cluster, differential accessibility was performed for disease cells vs control cells. The results are very odd for a two reasons.

1) most differentially accessible peaks (DAPs) have positive log fold change. I expect a relatively even distribution.

2) For each group of DAPs - more accessible vs less accessible - I looked at the average percent expression of the peaks (mean(pct.1, pct.2)) across all cells that were tested. Less accessible DAPs have very low percent expression compared to more accessible DAPs.

This kind of suggests to me that the results are still being confounded. It seems that diseased samples were sequenced at much higher depth than normal samples. FindMarkers() was done using 'LR' test and latent.vars = 'age' + 'sex' + 'pmi' + 'peak_region_fragments'. However, these variables only explain about 2% of variation of the first LSI component.

This does not appear to be an issue with FindMarkers. I did the LR test manually and obtained the same results.

My question is, are these results normal? Should I be concerned?

#ratio of more/less accessible peaks
sapply(daps, 
       function(x){
         nrow(x[x$avg_log2FC > 0, ]) / nrow(x[x$avg_log2FC < 0,])
       })
# Astro1       Exc1       Exc2       Exc3       Exc4       Exc5       Exc6
#       Inf   2.674431   1.343306   1.193959   2.680135   2.069767   7.410256
#      Exc7       Exc8       Exc9       Inh1       Inh2       Inh3       Inh4
#  1.666667   4.000000  11.323529   0.000000   8.857143  14.354839   0.200000
#      Inh5       Inh6     Micro1     Micro2     Oligo1     Oligo2     Oligo3
#       NaN   0.000000   4.281544        Inf  50.782609   6.168142  14.659794
#    Oligo4     Oligo6       OPC1
# 43.133333 133.500000   7.717391

#avg pct expression for more accessible peaks
sapply(daps, 
       function(x){
         up <- x[x$avg_log2FC > 0, ]
         (mean(up$pct.1) + mean(up$pct.2)) / 2
       })
#   Astro1      Exc1      Exc2      Exc3      Exc4      Exc5      Exc6      Exc7
#0.4190000 0.2579230 0.2242398 0.1995471 0.3494246 0.4230393 0.4671834 0.3156000
#     Exc8      Exc9      Inh1      Inh2      Inh3      Inh4      Inh5      Inh6
#0.3245893 0.4339987       NaN 0.4054919 0.3788831 0.2090000       NaN       NaN
#   Micro1    Micro2    Oligo1    Oligo2    Oligo3    Oligo4    Oligo6      OPC1
#0.1642079 0.3205952 0.1926952 0.2865990 0.1132419 0.4083640 0.3942303 0.3225000

#avg pct expression for less accessible peaks
sapply(daps, 
       function(x){
         down <- x[x$avg_log2FC < 0, ]
         (mean(down$pct.1) + mean(down$pct.2)) / 2
       })
#    Astro1       Exc1       Exc2       Exc3       Exc4       Exc5       Exc6
#       NaN 0.03284674 0.06592091 0.06011849 0.07640741 0.12004485 0.10333333
#      Exc7       Exc8       Exc9       Inh1       Inh2       Inh3       Inh4
#0.07950000 0.05021429 0.16548529 0.01466667 0.02285714 0.06637097 0.05780000
#      Inh5       Inh6     Micro1     Micro2     Oligo1     Oligo2     Oligo3
#       NaN 0.06325000 0.04214266        NaN 0.02335507 0.03598230 0.04663402
#    Oligo4     Oligo6       OPC1
#0.01756667 0.03025000 0.07107609
timoast commented 3 years ago

How did you process the data upstream of doing the differential testing? Can you provide the code, and output of sessionInfo()?

timoast commented 3 years ago

Any update here?

danielcgingerich commented 3 years ago

Hey @timoast , sorry for not getting back to you. I have been investigating this today, and it seems that signac was using the expm1 function on the 'data' slot in its fold change calculation, thus treating the data as normalized scRNA counts. Not sure why this was happening, as I ran the same code again today and got the correct results. I will post my code when I get home later today

timoast commented 3 years ago

This was due to a default in Seurat (designed for log-normalized data), that we now override in Signac 1.3.0 (see release notes)

danielcgingerich commented 3 years ago

@timoast Interesting. My signac version is 1.1.1.9010 still, but it worked correctly this time. I dont think I changed anything or updated anything since the last time I ran it Edit: I havent updated it since march 25

timoast commented 3 years ago

Not sure, maybe you set slot="counts" in FindMarkers()? Anyway, should be fixed in the current release