Split `run_ArchR_analysis.R` for multiome processing

run_ArchR_analysis.R does a lot: it uses ArchR to read the BAM file and BAM index, produces cell-by-bin and cell-by-gene matrices, and then performs a lot of analysis based on those matrices.

For multiome processing (SNARE-seq, 10X multiome), the downstream analysis is mostly just a waste of CPU and I/O since we'll want to use analysis steps which are aware of the RNA-seq results.

Split the run_ArchR_analysis.R script into at least two files: one which uses ArchR to read the BAM file and write cell-by-bin and cell-by-gene matrices to disk, then another which loads those matrices (and probably also whatever serialized R process state) and runs the rest of the analysis.

This will also require splitting the CWL file into (at least) two chained CWL steps.

At a high level, the requirements are:

The multiome RNA-seq + ATAC-seq pipeline should be able to embed the entire ATAC-seq pipeline as a submodule, and call a single CWL step that accepts directories with FASTQ files, then writes cell-by-bin and cell-by-gene matrices to disk.
The "standalone" ATAC-seq pipeline should perform all analysis that it currently does, and output the same files with the same names.

hubmapconsortium / sc-atac-seq-pipeline

Split `run_ArchR_analysis.R` for multiome processing #48