Closed Rebecza closed 3 years ago
Parameters needed to set in config Also in chronological order:
markdown::render(rmd, params = list(kb.dir = kb.dir,
resultsdir = resultsdir,
barcode_file = dual_barcodes,
MT_genes_file = MT_genes_file,
add.qc.ERCC = spike_in_qc,
add.qc.MT = mt_qc,
mt_pct_max = mt_pct_max,
ERCC_pct_max = ercc_pct_max,
gene_tresh = gene_tresh,
amount_cells_expr = amount_cells_expr,
total_counts_tresh = total_counts_tresh,
total_feat_tresh = total_feat_tresh,
confounders_to_test = confounders_to_test, # Explanation needed: Which phenodata entries to test if they are explanatory variables.
vars_to_regress = vars_to_regress, # * Default could be NULL or nCounts_sf and nFeature_sf
nHVG = nHVG,
n_bc = n_bc,
remove_bc = remove_bc,
pcs_for_overview = pcs_for_overview, # The max PCs settings to produce the UMAP for as a first exploratory overview. Default could be: 5,10,20,30,50, which runs for 1:5, 1:10 etc.
lab_col = lab_col, # Color label: Grouping variable used in the PCA representations and other QC plots (variable present in phenodata).
umap_col = umap_col, # Color labels used for UMAP representations.
filtering = filtering),
output_file= output_file)
.* We need to check if for the unspliced matrix, the regression should be performed with nCounts_uf instead of nCounts_sf.
Defaults with the option of changing Could be included in the documentation as options to change and not directly shown in the example config. This way they can be removed from the default list in config, however people van look up this setting and how to change it if they prefer to do so.
new_col_pattern = new_col_pattern, # Default is "" # ** Explanation: if a small substitution in the colname-strings is desired, these can be used to do so.
old_col_pattern = old_col_pattern, # Default is ""
## Could be included in the documentation as options to change, but taking the default values of library.
plate_variables = plate_variables, # Consider renaming: planame/name_variable, default = library
combined_variables_to_id = combined_variables_to_id, # Explanation needed: phenodata variables/entries to combine for labeling/grouping in visualizations.
pcs_max = pcs_max, # Maximum number of PCs to run PCA on the HVG. Default could be 50, in most cases enough.
.** I used this for the index labeling that is sometimes performed differently in Sybrens workflow: the index sometimes consists of 1 or 2 strings separated by "". By removing the "" in between the 2 fields, substituting for something else. Index will again be 1 field for all column names.
Most of use running in a separate RMD in RStudio:
filtering = filtering, # also indicating the options of "in" and "out"
subset_id = subset_id,
Defaults that could also be left out
run.plate_qc = plate_qc, # When running a plate-based method, you would always want to print these. (Can be optional in separate RMD run)
Something to consider to leave out: It might be difficult to guess which gene should be present in the dataset, therefore it might be that if the gene you want to visualize is not present, the violinplot crashes (either make the violin optional with a check if the gene is present, or leave this chunk out all together).
explore_violin = explore_violin,
Concerning:
params:
in analysis/kb_seurat_pp.rmdThe interactive param block works really nicely before knitting in Rstudio.
Two points regarding the parameter block:
* Is it possible to show one line with a short description of the variable? Especially when working in this "knit with parameters", one only sees the variable name. _(For example, when we have the variable filtering, the options are "in" or "out" or nothing, and people will not be aware of this.)_ * The order of the variables is very random, which makes it also more difficult to understand what they are for. _(For example `amount_cells_expr` and `gene_tresh` are in one filtering together, however listed with other variables apart.)_ Maybe in the config file the ordering is different as well?
A question: Do the variables in the param, all have to be in lower case? In some cases - for example nHVG - the use of both lower and upper case would also clarify the variable more. However, I understand this is probably a common convention (descriptions would do in that case as well).
I can help ordering and writing descriptions if you'd like!
I added a description for the different parameters and it is also now ordered in a more logical manner :). Regarding your question about the parameters, they do not have to be lower_case but it improves readability and code consistency. At least from a developing perspective. Having a mix of lower/upper case variables makes it difficult to debug. The description should help a bit.
.* We need to check if for the unspliced matrix, the regression should be performed with nCounts_uf instead of nCounts_sf.
That was indeed the case. I added a new parameter for "uf" regression
The majority of the points above should be fixed/implemented by now. To sum it up:
run.plate_qc
with method
parameter (plate,droplet). If plate is selected, the qc plots will be generated by defaultexplore_violin
parameter. It is still there in case someone wants to use it but commented out by default.
Concerning:
params:
in analysis/kb_seurat_pp.rmdThe interactive param block works really nicely before knitting in Rstudio.
Two points regarding the parameter block:
Is it possible to show one line with a short description of the variable? Especially when working in this "knit with parameters", one only sees the variable name. (For example, when we have the variable filtering, the options are "in" or "out" or nothing, and people will not be aware of this.)
The order of the variables is very random, which makes it also more difficult to understand what they are for. _(For example
amount_cells_expr
andgene_tresh
are in one filtering together, however listed with other variables apart.)_ Maybe in the config file the ordering is different as well?A question: Do the variables in the param, all have to be in lower case? In some cases - for example nHVG - the use of both lower and upper case would also clarify the variable more. However, I understand this is probably a common convention (descriptions would do in that case as well).
I can help ordering and writing descriptions if you'd like!