zhoulab / p53-chip-seq-data

Basic machine learning on genomic data
0 stars 0 forks source link

User input for samples to use in Master tables #12

Closed varsh1090 closed 7 years ago

varsh1090 commented 7 years ago

To add an option for user supplied input sample list to select which samples to keep for the list of peaks in the master table.

victorlin commented 7 years ago

I was able to implement this for the most part, except the FASTA file (target.fa) in the data folder needs to be updated as well. That file seems to be based on the merged peaks from etc/MACSscore_summary_valid_merged.bed. Is there a command that generates target.fa so we can include it in the pipeline, rather than having it in the data folder?

A few more questions:

  1. I noticed that there are 18 samples in data/FEfiles, whereas data/SampleAnnos and data/SampleBEDs each have 11 samples. Is that supposed to be the case?
  2. I have a script to concatenate all the sample annotation files into etc/all_samples.anno, yet this file is not used anywhere else in the pipeline. Do you remember what its purpose was?
varsh1090 commented 7 years ago

Command to get fasta file -

module load gcc/5.2.0 homer/4.8 findMotifsGenome.pl sample_peaks.bed /ufrc/zhou/share/genomes/dm6/Sequence/WholeGenomeFasta/genome.fa sample_peaks_mask/ -size given -mask -p 4 -dumpFasta

victorlin commented 7 years ago

Rather than running the whole HOMER analysis, this command can give just the FASTA file:

homerTools extract <BED file> /ufrc/zhou/share/genomes/dm6/Sequence/WholeGenomeFasta/genome.fa -mask -fa
victorlin commented 7 years ago

The script works by reading all the sample files from data/SampleBEDs, data/FEfiles, and data/SampleAnnos. Every sample must have a file in each of those directories.