simonlabcode / bakR

Other
6 stars 2 forks source link

Two way anova with bakR #7

Closed AlessandroPilli closed 1 month ago

AlessandroPilli commented 1 month ago

Hi isac

I had previously contacted you for problems related to bam2bakR that you very kindly helped me solve and I thank you again. Now that everything is working perfectly I have a question related to the statistics part.

Since my experimental set includes data from a wild type sample, a sample with a knockout of a gene, a sample with a mutation of another gene and a sample that presents both knockout and mutation together, it would be perfect to use a TWO WAY ANOVA to evaluate the differences.

dds <- DESeqDataSetFromMatrix(countData = Counts, colData = colData, design = ~conditions * interaction)

where interaction would be the different condition of the sample.

thanks, Alessandro

isaacvock commented 1 month ago

Hello Alessandro,

Unfortunately, bakR is highly inflexible in the types of comparative analyses it can perform, and cannot accommodate the design matrix that you described. That is one of many things that inspired me to completely rewrite bakR and develop a new R package to replace bakR, called EZbakR. One of the things that I am close to having fully implemented in EZbakR is flexible design-matrix specified comparative analyses. I will let you know when that is available and documented (probably by the end of next week).

Once I have finished the EZbakR implementation, I might also be able to write you a script to take your current bakR output and perform a two-way anova with it, so that you can use the analyses you have already generated. I'll let you know if that turns out to be possible.

Best, Isaac

AlessandroPilli commented 1 month ago

Thank you, then I'll wait for updates

isaacvock commented 1 month ago

Hello Alessandro,

I have finished implementing generalized linear modeling in EZbakR, and have documented this functionality here. Unfortunately, I was not able to come up with an easy way to take your current bakR output and perform EZbakR's modeling with it. Thankfully though, the input to EZbakR is very similar to bakR, but much more flexible (documented here). Let me know if you have any questions, and if you run into any issues, please post them on the EZbakR Issues page.

For the sake of completeness, the workflow will look like:

library(dplyr)
library(data.table)
library(EZbakR)

# load cB
cB <- fread("path/to/cB.csv.gz")

# knockout and mutation could be TRUE/FALSE booleans
metadf <- tibble(sample = ..., tl = ...., knockout = ..., mutation = ....)

ezbdo <- EZbakRData(cB, metadf)

ezbdo <- EstimateFractions(ezbdo, features = "XF")

ezbdo <- EstimateKinetics(ezbdo)

# Fit model with interaction term
ezbdo <- AverageAndRegularize(ezbdo, parameter = "log_kdeg", formula_mean = ~ knockout*mutation)
ezbdo <- AverageAndRegularize(ezbdo, parameter = "log_ksyn", formula_mean = ~ knockout*mutation)

# Could also just fit interaction model (calculate averages in each combo of knockout and mutation
# which has some advantages to above model
ezbdo <- AverageAndRegularize(ezbdo, parameter = "log_kdeg", formula_mean = ~ knockout:mutation)
ezbdo <- AverageAndRegularize(ezbdo, parameter = "log_ksyn", formula_mean = ~ knockout:mutation)

# Assess parameters of interest with CompareParameters(); see documentation for details

I discuss this a bit in the documentation, but there are a couple ways to fit categorical interaction models

Best regards, Isaac

AlessandroPilli commented 1 month ago

Hi Isac

Just a couple more things:

  1. If I understood correctly EZbakR is providing results about the Differential synthesis analysis, but what about the Differential stability, is there a way to perform such analysis also for that part?

  2. The function EZVolcanoPlot is used on an EZbakRCompare on which CompareParameters have been applied, in my case:

ezbdo <- CompareParameters(ezbdo, parameter = "log_kdeg", design_factor = "", reference = "knockoutWT:mutationWT", # WT experimental = "knockoutKO:mutationITD") # AF (KO+ITD)

In the ezbdo object is called comparisonX (where X is the number related to the comparison that ran, in this case comparison1)

Now how do I specify that I want to plot only the part corresponding to this comparison (since there are no examples in the documentation if not for the first and very simple case)? Do I have to specify again the reference and experimental like the code that follows or there is a way to select the desired comparison via the specific name?

EZVolcanoPlot(ezbdo, parameter = "log_kdeg", reference = "knockoutWT:mutationWT", # WT experimental = "knockoutKO:mutationITD") # AF

isaacvock commented 1 month ago

Hi Alessandro,

1) No, the documentation shows EZbakR performing both differential stability and differential synthesis analyses. Running AverageAndRegularize(ezbdo) yields, by default, analyses of stability (parameter="log_kdeg"), and you can get synthesis rate analyses by setting the parameter argument to "log_ksyn". So both the stability and synthesis analyses should be in the ezbdo$averages list after these two calls to AverageAndRegularize(). Performing CompareParameters(..., parameter = "log_kdeg") will perform differential stability analyses and CompareParameters(..., parameters = "log_ksyn") will perform differential synthesis analyses. 2) In general, whatever you specified in the arguments for CompareParameters() can be specified to EZVolcanoPlot() and EZMAPlot() in order to plot the result of a particular analysis. In particular, the design_factor, reference, experimental, and param_name arguments should correspond to whichever run of CompareParameters() you would like to visualize the results for.

More specifically, you don't need to specify every single one of these arguments though, you just need to specify the minimal set that uniquely identifies a given comparative analysis. Say for example that you have run CompareParameters() twice, once with parameter = "log_ksyn" and once with parameter = "log_kdeg", and with the other parameters set to whatever you set them to. To visualize the differential stability result, you could just run EZVolcanoPlot(ezbdo, parameter = "log_kdeg") since you have only performed one differential stability analysis at this point, and thus this analysis is uniquely identifiable by you having set parameter = "log_kdeg" in CompareParameters(). So the code you showed should work in principle, but you may not need to specify all of parameter, reference, and experimental to get it to work in this case.

I'll work on adding documentation with examples of Volcano and MA plots for each of the comparison types I show in the linear modeling docs. Let me know if you have any other questions.

Best, Isaac

AlessandroPilli commented 4 days ago

Hi Isaac, it is me once again,

I encountered a final problem during the use of EZbakR. I'm evaluating the synthesis results, but when executing CompareParameters with the parameter log_ksyn and not the evaluated interaction term but the single one, I'm getting a strange error that I'll report below. This is happening for both single-terms (knockoutKO and mutationITD") I want to let you know that CompareParameters with the single therms worked when using log_kdeg and worked with log_ksyn only in the case of the interaction term:

Here the command executed and the error.

ezbdo <- AverageAndRegularize(ezbdo, parameter = "log_ksyn", formula_mean = ~ knockout*mutation)

ezbdo$metadata$averages$logksyn_XF$fit_params [1] "knockoutWT" "knockoutKO" "mutationITD"
[4] "knockoutKO:mutationITD"

ezbdo <- CompareParameters(ezbdo, parameter = "log_ksyn",
param_name = "knockoutKO") Error in obj[["comparisons"]][[output_name]] <- dplyr::as_tibble(comparison) : attempt to select less than one element in OneIndex

Best, Alessandro

isaacvock commented 4 days ago

Hi Alessandro,

I was able to reproduce the bug on my end and just pushed a fix. Update EZbakR with devtools::install_github("isaacvock/EZbakR"). You should then be able to run CompareParameters() without modifying any of your current parameters. Let me know if you continue to run into issues even after the update.

Details if you are interested: problem was just in how EZbakR tries to decide the name of the output table. So CompareParameters() probably ran fine for you, up until the very last step of saving the output to the EZbakRData object, where the proposed name for the output was an empty vector (leading to the attempt to select less than one element in OneIndex error).

Best, Isaac