stjude-biohackathon / CRCminer

MIT License
2 stars 1 forks source link

Determine program CLI commands, arguments, and options. #10

Closed j-andrews7 closed 1 year ago

j-andrews7 commented 1 year ago

This is important because it provides a vision of what data and options will be passed through, where we need to build in flexibility, etc. I'm sure there will be some changes, but starting with a strong idea of what actually using the tool looks like is probably smart.

My idea for the main functionality:

Usage: crcminer mine [-f <FASTA file>] [-e <ROSE2 output>] [-m <Motif PWMs file>] 
        [-map <Motif ID to gene ID mapping file>] [-sp <Subpeaks file>] [-g <Active genes file>] [-n <Analysis name>]

Options:
  -f, --fasta <FASTA file>             FASTA file for genome.
  -e, --enhancers <ROSE2 output>       ROSE2 output of annotated (super)enhancers.
  -m, --motifs <Motif PWMs file>       Motif PWMs in MEME format.
  -map, --mapping <Mapping file>       Motif ID to gene ID mapping file.
  -sp, --subpeaks <Subpeaks file>      Subpeaks to use for motif scanning, e.g. ATAC peaks or 
                                         stringent H3K27ac peaks (summit +/- 50 bp, etc). Optional.
  -g, --genes <Active genes file>      List of active genes, e.g. genes with TPM > 1. 
                                         Used to filter motifs used for scanning. Optional.
  -n, --name <Analysis name>           Analysis name, used for output file naming. Optional.

Description:
  Mine the genome for motif occurrences within enhancer regions, using the provided motif PWMs.

  The following files are required:
  * FASTA file for genome
  * ROSE2 output of annotated (super)enhancers
  * Motif PWMs in MEME format
  * Motif ID to gene ID mapping file

  The following files are optional:
  * Subpeaks file to use for motif scanning
  * List of active genes to filter motifs used for scanning
  * Analysis name for output file naming

Examples:
  crcminer mine -f genome.fa -e enhancers.bed -m motifs.meme -map motif_gene_map.txt
  crcminer mine -f genome.fa -e enhancers.bed -m motifs.meme -map motif_gene_map.txt \
    -sp subpeaks.bed -g active_genes.txt -n analysis1

We could consider other commands for comparing networks, e.g. crcminer compare, or starting the Dash app crcminer report (and just point to the output directory of one or more runs).