Closed LorenzoMerotto closed 9 months ago
@alex-d13 Let me know what you think considering my comments
I think splitting it up would be much nicer. So basically we have one process+script for each simulation setup right? And the workflow would then consist of:
Where the outputs of steps 1-4 will be concatenated into one long list of pseudo-bulks that we use as input for steps 5-7.
Then I think in this way it is easier to work independently on each simulation script, or to add new tests
@alex-d13 I updated the part for the different resolutions, how does that sound to you?
I like it :)
Note for @LorenzoMerotto
Files that need to be added/Things to do:
Overview of the analysis we want to carry on simulated datasets:
[x] Spillover analysis: basically we simulate n datasets, where n is the number of cell types that we have, and we deconvolute each one of these independently using the signature matrix build on all the cell types. We will do it on the LUNG data Parameters required:
[x] Unknown cell content: the idea is that we take few cell types (e.g. B cells, T cells and such) + one cell type that will act as "unknown" content. The samples will be simulated containing an increasing value of unknown cell content. We will do it on the LUNG data Parameters required:
[x] Impact of cell type resolution: in this case we could envision a multi-resolution deconvolution. We take the Lambrechts dataset and we consider three level of annotation.
We will then simulate some datasets using the finer level. Then we will obtain the samples + the facs. The facs can then be combined to obtain the samples composition at the three different levels. Then, starting with the same single cells we will build the signature matrix using the three different levels of annotation -> we wil get three signature matrices to be used to deconvolve the same bulk. We can then compare each bulk to the reference at the respective level of annotations. We could do this once for T cell subtypes, once for dendritic cell subtypes
Now, what I have in mind is: we create an individual Nextflow process for each of these setups, and we run all of them subsequentially in the simulation workflow. The problem is that in some cases we have more parameters than others, which would lead to many optional inputs for the
SimulateBulkNF.R
script. So theoretically we could create multiple R scripts for each simulation (essimulation_spillover.R
,simulation_sensitivity.R
, etc), which would be also a cleaner solution overall IMO