This pull request mainly generalizes the configuration of DESeq2 differential expression analysis in a way that closes #53. Every use of condition (except for one very basic test case) is changed to variable_of_interest in general settings and to specific variable names wherever something is configured for a particular analysis. The aim is to always be explicit about configuration, as opposed to relying on defaults in DESeq2 (which I find confusing at times).
The new setup allows for:
Multiple variables_of_interest. For each of them, a base_level needs to be explicitly specified and is set up accordingly in deseq-init.R using relevel() (and also ensuring that all of these variables are turned into factor()s, as this would not happen if all entries in a column are numeric). This explicit specification ensures that users get the fold change that they are actually looking for.
If multiple variables_of_interest are specified, the model will per default include all the possible interactions, so the resulting formula will contain something like vof1 * vof2 for two variables_of_interest of that name. If you wonder why I chose this as a default, there's a nice and concise demonstration on why you want the interaction * to be the default over the addition + of terms. But if the user is sure that they actually don't want interaction terms, they can override this by manually providing a formula under model: "".
Multiple batch_effects, which are added to the model with + (to exclude their effect from the other terms).
Multiple contrasts. These each trigger a separate call of deseq2.R, namely the results() function of DESeq2. For each of the contrasts: specified, two out of three options that the ?results help page describes for the contrast argument can be used:
Specifying a variable_of_interest and the level_of_interest will result in a contrast c(variable_of_interest, level_of_interest, base_level).
Specifying a string can be used to provide a list of (one or) two character vectors that can contain all the resultsNames(dds_object), for example something like: "list(c("genotypeIII.conditionB"), c("genotypeII.conditionB"))'.
In addition, I fixed some other things in this pull request, as this will mean a major version bump, anyways:
applied some R lints to the R scripts
some cleanup of comments
included a more complex testing setup needed for the variables_of_interest changes
This pull request mainly generalizes the configuration of
DESeq2
differential expression analysis in a way that closes #53. Every use ofcondition
(except for one very basic test case) is changed tovariable_of_interest
in general settings and to specific variable names wherever something is configured for a particular analysis. The aim is to always be explicit about configuration, as opposed to relying on defaults inDESeq2
(which I find confusing at times).The new setup allows for:
variables_of_interest
. For each of them, abase_level
needs to be explicitly specified and is set up accordingly indeseq-init.R
usingrelevel()
(and also ensuring that all of these variables are turned intofactor()
s, as this would not happen if all entries in a column are numeric). This explicit specification ensures that users get the fold change that they are actually looking for. If multiplevariables_of_interest
are specified, the model will per default include all the possible interactions, so the resulting formula will contain something likevof1 * vof2
for twovariables_of_interest
of that name. If you wonder why I chose this as a default, there's a nice and concise demonstration on why you want the interaction*
to be the default over the addition+
of terms. But if the user is sure that they actually don't want interaction terms, they can override this by manually providing a formula undermodel: ""
.batch_effects
, which are added to the model with+
(to exclude their effect from the other terms).contrasts
. These each trigger a separate call ofdeseq2.R
, namely theresults()
function ofDESeq2
. For each of thecontrasts:
specified, two out of three options that the?results
help page describes for thecontrast
argument can be used:variable_of_interest
and thelevel_of_interest
will result in a contrastc(variable_of_interest, level_of_interest, base_level)
.resultsNames(dds_object)
, for example something like:"list(c("genotypeIII.conditionB"), c("genotypeII.conditionB"))'
.In addition, I fixed some other things in this pull request, as this will mean a major version bump, anyways:
variables_of_interest
changesconfig/README.md
to better explain stuff for users via the snakemake workflow catalog