User input for samples to use in Master tables

varsh1090 commented 7 years ago

To add an option for user supplied input sample list to select which samples to keep for the list of peaks in the master table.

Might have to rerun the bedtools merge command 1st to get the merged peaks from say 9/11 samples. Or we can remove values from the master table containing a particular sample name (whichever is easier).

victorlin commented 7 years ago

I was able to implement this for the most part, except the FASTA file (target.fa) in the data folder needs to be updated as well. That file seems to be based on the merged peaks from etc/MACSscore_summary_valid_merged.bed. Is there a command that generates target.fa so we can include it in the pipeline, rather than having it in the data folder?

A few more questions:

I noticed that there are 18 samples in data/FEfiles, whereas data/SampleAnnos and data/SampleBEDs each have 11 samples. Is that supposed to be the case?
I have a script to concatenate all the sample annotation files into etc/all_samples.anno, yet this file is not used anywhere else in the pipeline. Do you remember what its purpose was?

varsh1090 commented 7 years ago

Command to get fasta file -

module load gcc/5.2.0 homer/4.8 findMotifsGenome.pl sample_peaks.bed /ufrc/zhou/share/genomes/dm6/Sequence/WholeGenomeFasta/genome.fa sample_peaks_mask/ -size given -mask -p 4 -dumpFasta

victorlin commented 7 years ago

Rather than running the whole HOMER analysis, this command can give just the FASTA file:

homerTools extract <BED file> /ufrc/zhou/share/genomes/dm6/Sequence/WholeGenomeFasta/genome.fa -mask -fa

victorlin commented 7 years ago

The script works by reading all the sample files from data/SampleBEDs, data/FEfiles, and data/SampleAnnos. Every sample must have a file in each of those directories.

zhoulab / p53-chip-seq-data

User input for samples to use in Master tables #12