Okay, a few more changes below. I'm gonna merge this one to get the docker/conda deployment working, but please let me know if these changes don't make sense @skanwal!
Devtools
Multiple small changes to satisfy devtools crancheck (e.g. .data$ and var quoting)
Fusions
Sort table by *dna_support, reported_fusion, fusion_caller
CLI
I've gone ahead and changed the following options to be flags that you need to specify in order to enable:
--batch_rm
--dataset_name_incl
--drugs
--filter
--immunogram
--log
--pcgr_splice_vars
--save_tables
This means you need to be explicit for those! If we want to have some of those enabled by default, we should instead create negating options e.g. --nofilter, --nolog etc.
Directory inputs: handle Dragen WTS and Arriba directories. I've tested this locally but would need a bit more testing:
dragen_wts_dir: if this is specified and any of salmon, fusions, or mapmetrics are not specified, do a list.files in that directory to match the specific file patterns. If no matches, those params become NULL (e.g. dragen_fusions becomes NULL). Note that for the salmon counts we're looking for the quant.genes.sf file, not the quant.sf one.
arriba_dir: same. If you specify arriba_dir and don't specify arriba_tsv or arriba_pdf, it constructs the paths to those as file.path(arriba_dir, "fusions.[tsv/pdf]")
Structural variants
The sv_prioritize function in prod has a bug where it reads in the SR column that has split read support for ref and alt separated by , in a numeric column ignoring the ,, so that if you have an SR of 20,15, that gets read in as 2015. This has downstream ramifications when column splitting happens so the SR alt becomes NA for all variants. Same thing happens for PR alt. This happens because there are no col_classes specified when using readr::read_tsv, so readr does its best to infer what on earth the columns are, but here its guess was erroneous.
But. In a fortunate turn of events, the only impact this has on the SV table is that the SR/PR columns are blank, and the BND/SR filter applied at https://github.com/umccr/RNAsum/blob/master/rmd_files/RNAseq_report.Rmd#L1638 does not filter out anything since SR is NA for all rows there. And I'm actually okay with that, since that filter has been changed in umccrise itself. So all good!
I've gone ahead and modified that function in dev to read in columns with explicit classes, and I've completely removed that filter. So numbers stack up now. I've also handled the tidyr::unnest(annotation) warning we've been getting since forever.
Conda
I've pinned dependencies to match the current ones in prod, main ones are:
I also released a RNAsum.data v0.0.3 for R v4.1 (same as v0.0.2, but that was build for R v4.2). Todo would be to have that built for R v4.1/4.2/4.3 in parallel. It's okay for now.
Rmd other
Added arriba_dir and dragen_wts_dir params.
The scaling param needed to be evaluated earlier.
The hline in the violin plot was made more transparent and changed deprecated size to linewidth. Also increased its height and decreased its width. Looks less wonky now.
CN genomic view plot has cancer genes in red points with 0.5 opacity (so that you can see through them)
inst/scripts/icav1_download_and_run.R
I use this to automatically download results from GDS and run the report locally, and it works well (when there are no rounding bugs locally!)
Okay, a few more changes below. I'm gonna merge this one to get the docker/conda deployment working, but please let me know if these changes don't make sense @skanwal!
Devtools
Multiple small changes to satisfy devtools crancheck (e.g.
.data$
and var quoting)Fusions
Sort table by
*dna_support
,reported_fusion
,fusion_caller
CLI
I've gone ahead and changed the following options to be flags that you need to specify in order to enable:
--batch_rm
--dataset_name_incl
--drugs
--filter
--immunogram
--log
--pcgr_splice_vars
--save_tables
This means you need to be explicit for those! If we want to have some of those enabled by default, we should instead create negating options e.g.--nofilter
,--nolog
etc.Directory inputs: handle Dragen WTS and Arriba directories. I've tested this locally but would need a bit more testing:
dragen_wts_dir
: if this is specified and any of salmon, fusions, or mapmetrics are not specified, do alist.files
in that directory to match the specific file patterns. If no matches, those params become NULL (e.g.dragen_fusions
becomes NULL). Note that for the salmon counts we're looking for thequant.genes.sf
file, not thequant.sf
one.arriba_dir
: same. If you specifyarriba_dir
and don't specifyarriba_tsv
orarriba_pdf
, it constructs the paths to those asfile.path(arriba_dir, "fusions.[tsv/pdf]")
Structural variants
The
sv_prioritize
function in prod has a bug where it reads in theSR
column that has split read support for ref and alt separated by,
in a numeric column ignoring the,
, so that if you have an SR of20,15
, that gets read in as2015
. This has downstream ramifications when column splitting happens so the SR alt becomesNA
for all variants. Same thing happens for PR alt. This happens because there are nocol_classes
specified when usingreadr::read_tsv
, soreadr
does its best to infer what on earth the columns are, but here its guess was erroneous. But. In a fortunate turn of events, the only impact this has on the SV table is that the SR/PR columns are blank, and the BND/SR filter applied at https://github.com/umccr/RNAsum/blob/master/rmd_files/RNAseq_report.Rmd#L1638 does not filter out anything since SR is NA for all rows there. And I'm actually okay with that, since that filter has been changed in umccrise itself. So all good!I've gone ahead and modified that function in dev to read in columns with explicit classes, and I've completely removed that filter. So numbers stack up now. I've also handled the
tidyr::unnest(annotation)
warning we've been getting since forever.Conda
base
: 4.1.3edgeR
: 3.36.0limma
: 3.50.1manhattanly
didn't have a conda pkg for R v4.1 so I created one at https://anaconda.org/umccr/r-manhattanly using the recipe at https://github.com/umccr/conda_recipes/tree/main/r-manhattanlyRNAsum.data
v0.0.3 for R v4.1 (same as v0.0.2, but that was build for R v4.2). Todo would be to have that built for R v4.1/4.2/4.3 in parallel. It's okay for now.Rmd other
arriba_dir
anddragen_wts_dir
params.scaling
param needed to be evaluated earlier.size
tolinewidth
. Also increased its height and decreased its width. Looks less wonky now.inst/scripts/icav1_download_and_run.R