run_ArchR_analysis.R does a lot: it uses ArchR to read the BAM file and BAM index, produces cell-by-bin and cell-by-gene matrices, and then performs a lot of analysis based on those matrices.
For multiome processing (SNARE-seq, 10X multiome), the downstream analysis is mostly just a waste of CPU and I/O since we'll want to use analysis steps which are aware of the RNA-seq results.
Split the run_ArchR_analysis.R script into at least two files: one which uses ArchR to read the BAM file and write cell-by-bin and cell-by-gene matrices to disk, then another which loads those matrices (and probably also whatever serialized R process state) and runs the rest of the analysis.
This will also require splitting the CWL file into (at least) two chained CWL steps.
At a high level, the requirements are:
The multiome RNA-seq + ATAC-seq pipeline should be able to embed the entire ATAC-seq pipeline as a submodule, and call a single CWL step that accepts directories with FASTQ files, then writes cell-by-bin and cell-by-gene matrices to disk.
The "standalone" ATAC-seq pipeline should perform all analysis that it currently does, and output the same files with the same names.
run_ArchR_analysis.R
does a lot: it uses ArchR to read the BAM file and BAM index, produces cell-by-bin and cell-by-gene matrices, and then performs a lot of analysis based on those matrices.For multiome processing (SNARE-seq, 10X multiome), the downstream analysis is mostly just a waste of CPU and I/O since we'll want to use analysis steps which are aware of the RNA-seq results.
Split the
run_ArchR_analysis.R
script into at least two files: one which uses ArchR to read the BAM file and write cell-by-bin and cell-by-gene matrices to disk, then another which loads those matrices (and probably also whatever serialized R process state) and runs the rest of the analysis.This will also require splitting the CWL file into (at least) two chained CWL steps.
At a high level, the requirements are: