This is important because it provides a vision of what data and options will be passed through, where we need to build in flexibility, etc. I'm sure there will be some changes, but starting with a strong idea of what actually using the tool looks like is probably smart.
My idea for the main functionality:
Usage: crcminer mine [-f <FASTA file>] [-e <ROSE2 output>] [-m <Motif PWMs file>]
[-map <Motif ID to gene ID mapping file>] [-sp <Subpeaks file>] [-g <Active genes file>] [-n <Analysis name>]
Options:
-f, --fasta <FASTA file> FASTA file for genome.
-e, --enhancers <ROSE2 output> ROSE2 output of annotated (super)enhancers.
-m, --motifs <Motif PWMs file> Motif PWMs in MEME format.
-map, --mapping <Mapping file> Motif ID to gene ID mapping file.
-sp, --subpeaks <Subpeaks file> Subpeaks to use for motif scanning, e.g. ATAC peaks or
stringent H3K27ac peaks (summit +/- 50 bp, etc). Optional.
-g, --genes <Active genes file> List of active genes, e.g. genes with TPM > 1.
Used to filter motifs used for scanning. Optional.
-n, --name <Analysis name> Analysis name, used for output file naming. Optional.
Description:
Mine the genome for motif occurrences within enhancer regions, using the provided motif PWMs.
The following files are required:
* FASTA file for genome
* ROSE2 output of annotated (super)enhancers
* Motif PWMs in MEME format
* Motif ID to gene ID mapping file
The following files are optional:
* Subpeaks file to use for motif scanning
* List of active genes to filter motifs used for scanning
* Analysis name for output file naming
Examples:
crcminer mine -f genome.fa -e enhancers.bed -m motifs.meme -map motif_gene_map.txt
crcminer mine -f genome.fa -e enhancers.bed -m motifs.meme -map motif_gene_map.txt \
-sp subpeaks.bed -g active_genes.txt -n analysis1
We could consider other commands for comparing networks, e.g. crcminer compare, or starting the Dash app crcminer report (and just point to the output directory of one or more runs).
This is important because it provides a vision of what data and options will be passed through, where we need to build in flexibility, etc. I'm sure there will be some changes, but starting with a strong idea of what actually using the tool looks like is probably smart.
My idea for the main functionality:
We could consider other commands for comparing networks, e.g.
crcminer compare
, or starting the Dash appcrcminer report
(and just point to the output directory of one or more runs).