Closed varsh1090 closed 7 years ago
I was able to implement this for the most part, except the FASTA file (target.fa
) in the data
folder needs to be updated as well. That file seems to be based on the merged peaks from etc/MACSscore_summary_valid_merged.bed
. Is there a command that generates target.fa
so we can include it in the pipeline, rather than having it in the data
folder?
A few more questions:
data/FEfiles
, whereas data/SampleAnnos
and data/SampleBEDs
each have 11 samples. Is that supposed to be the case?etc/all_samples.anno
, yet this file is not used anywhere else in the pipeline. Do you remember what its purpose was?Command to get fasta file -
module load gcc/5.2.0 homer/4.8 findMotifsGenome.pl sample_peaks.bed /ufrc/zhou/share/genomes/dm6/Sequence/WholeGenomeFasta/genome.fa sample_peaks_mask/ -size given -mask -p 4 -dumpFasta
Rather than running the whole HOMER analysis, this command can give just the FASTA file:
homerTools extract <BED file> /ufrc/zhou/share/genomes/dm6/Sequence/WholeGenomeFasta/genome.fa -mask -fa
The script works by reading all the sample files from data/SampleBEDs
, data/FEfiles
, and data/SampleAnnos
. Every sample must have a file in each of those directories.
To add an option for user supplied input sample list to select which samples to keep for the list of peaks in the master table.