hubmapconsortium / sc-atac-seq-pipeline

ATAC-seq pipelines for HuBMAP
MIT License
2 stars 4 forks source link

Split `run_ArchR_analysis.R` for multiome processing #48

Open mruffalo opened 1 year ago

mruffalo commented 1 year ago

run_ArchR_analysis.R does a lot: it uses ArchR to read the BAM file and BAM index, produces cell-by-bin and cell-by-gene matrices, and then performs a lot of analysis based on those matrices.

For multiome processing (SNARE-seq, 10X multiome), the downstream analysis is mostly just a waste of CPU and I/O since we'll want to use analysis steps which are aware of the RNA-seq results.

Split the run_ArchR_analysis.R script into at least two files: one which uses ArchR to read the BAM file and write cell-by-bin and cell-by-gene matrices to disk, then another which loads those matrices (and probably also whatever serialized R process state) and runs the rest of the analysis.

This will also require splitting the CWL file into (at least) two chained CWL steps.

At a high level, the requirements are: