orphancode / NuMap

NuMap - http://www-hsc.usc.edu/~valouev/NuMap/NuMap.html
2 stars 1 forks source link

NuMap

NuMap - http://www-hsc.usc.edu/~valouev/NuMap/NuMap.html by Anton Valouev

I uploaded this software because I couldn't compile it from the original distribution (there was a header missing), so I thought it would be of interest to others to get pass this problem as well.

I am further uploading the compiled (linux x86_64) binaries as well (they're in bin).

Install

git clone git@github.com:afrendeiro/NuMap.git
cd NuMap
make
# add to $PATH or copy to /usr/bin/

Workflow

From the original documentation:

Below is the description of the minimal analysis necessary to perform nucleosome map comparison. Details of other additional modules are provided in the 'Module Description' section:

  1. Obtain SAM MNase-Seq files.

  2. Calculate align files:

    sam_2_align_bin genome_table=hg19.gt input_sam_file=MNase_seq.sam analysis_path=./MNase_seq_analysis

    Note:hg19.gt is provided with the distribution. You can use QuEST software to generate genome tables for other genomes.

  3. Estimate MNase-Seq library fragment size

    dist_plots analysis_path=./MNase_seq_analysis

  4. Calculate dyads from MNase-seq reads

    align_2_dyads analysis_path=./MNase_seq_analysis

  5. Calculate stringency plots

    dyad_stringency analysis_path=./MNase_seq_analysis

  6. Call dyads

    call_dyads analysis_path=./MNase_seq_analysis output_path=././MNase_seq_analysis/dyad_calls/

  7. Match dyads between two experiments

    match_dyads called_dyads_file1=./MNase_seq_analysis1/dyad_calls/dyad_positions.txt called_dyads_file2=././MNase_seq_analysis2/dyad_calls/dyad_positions.txt match_dist=30 genome_table=hg19.gt output_path=./dyad_matching

  8. Calculate nucleosome binding differences at a set of genomic sites

    diff_summaries_at_sites called_dyads_file1=./dyad_matching/dyad_matches unmatched_file1=./dyad_matching/unmatched_dyads1 unmatched_file2=./dyad_matching/unmatched_dyads2 positions_file=sites.txt genome_table=hg19.gt output_file=sites.nucl_diff dist=5000

NuMap Modules

From the original documentation:

sam_2_align_bin

Converts sam files to base-by-base alignment binary profiles accross the genome. Example:

sam_2_align_bin genome_table=hg19.gt input_sam_file=MNase_Seq.sam analysis_path=./MNase_Seq_analysis

genome table is a tab-delimited file providing coordinate sizes: chr1 249250621 chr2 243199373 ...

dist_plots

Calculates distograms and phasograms of raw data. Estimates average fragment size of MNase-Seq library. Example:

dist_plots analysis_path=./MNase_Seq_analysis

align_2_dyads

Calculate dyad coordinates by adjusting coordinates of reads ends by half the library size. Example:

align_2_dyads analysis_path=./MNase_Seq_analysis

simulate_dyads

Simulate dyad coordinates by permuting MNase-Seq inferred dyads within a specified window. Example:

simulate_dyads analysis_path=./MNase_Seq_analysis window=1000

dyad_stringency

Calculate dyad stringency using dyad coordinates. Example:

dyad_stringency analysis_path=./MNase_Seq_analysis mock=no core_size=147 bw=100

call_dyads

Call dyads using dyad stringency profiles. Example:

call_dyads analysis_paht=./MNase_Seq_analysis output_path=./MNase_Seq_analysis/dyad_calls mock=no peak_value=mean drop=0.1 drop_window=150 bw=100

This module will produce dyad call coordinates in ./MNase_Seq_analysis/dyad_calls/dyad_positions.txt The file is tab delimited and has the following fields:

#chrom  pos     stringency      count_enrichment        count
chr1    10099   0.3758  0.7594  51.2832

This module also generates bed files containing dyad calls which can be visualized using UCSC Genome Browser. The files are located under the same directory as dyad calls and are split by chrom

nucleosome_coverage

Calculates nucleosome coverage profiles. Example:

nucleosome_coverage analysis_path=./MNase_Seq_analysis/ mock=no core_size=147 spacing=193

coverage_to_wig

Produces coverage wig files to be viewed using UCSC Genome Browser. Example:

coverage_to_wig analysis_path=./MNase_Seq_analysis positions_file=sites.txt output_file=sites_coverage.wig max_dist=10000 track_name=dyad_coverage

stringency_to_wig

Produces stringency wig files to be viewed using UCSC Genome Browser. Example:

stringency_wig analysis_path=./MNase_Seq_analysis positions_file=sites.txt output_file=sites_stringency.wig max_dist=1000 track_name=dyad_stringency

phasogram_of_sites

Calculates phasogram of sites. Dyad calls can be used as an input to calcualte phasogram of the dyad calls. Example:

phasogram_of_sites positions_file=./MNase_Seq_analysis/dyad_calls/dyad_positions.txt output_file=./MNase_Seq_analysis/dyad_calls/phasogram max_dist=3000

estimate_peaks

Provides smoothing for any plot including phasogram and dyadogram using bandwidth of the specified size. Can be used to estimate nucleosome spacing. Example:

estimate_peaks dist_file=./MNase_Seq_analysis/dyad_calls/phasogram output_file=./MNase_Seq_analysis/dyad_calls/phasogram.smoothed bw=20 field=1

The output file will contain the smoothed positions in the third field. The output file ./MNase_Seq_analysis/dyad_calls/phasogram.smoothed.peak_pos.txt will contain peaks of the phasogram along to be used for fitting in R and estimating nucleosome spacing. The file has the following format:

<peak number> <peak positions>

The file can be opened in R to fit the linear regression:

w<-read.table(file="./MNase_Seq_analysis/dyad_calls/phasogram.smoothed.peak_pos.txt")
mlm<-lm(w$V2~w$V1)
mlm

The following output should be seen:

Call:
lm(formula = w$V2 ~ w$V1)
Coefficients:
(Intercept)         w$V1  
      193.7        193.6

called_dyad_organization

Plots a dyadogram of called dyads at the collection of sites (e.g. TFBS). The resulting profiles can be further smoothed using estimate peaks module. Example:

called_dyad_organization positions_file=sites.txt called_dyads_file=./MNase_Seq_analysis/dyad_calls/dyad_positions.txt genome_table=hg19.gt output_file=sites.dyadogram max_dist=3000

match_dyads

Matches the dyads between two nucleosome mapping experiments. Example:

match_dyads called_dyads_file1=dyad_calls1.txt called_dyads_file2=dyad_calls2.txt genome_table=hg19.gt match_dist=30 output_path=./dyad_matching/

The program outputs the following files of interest within the ouptut path:

where pos1 and pos2 provide positoins of matched dyads on the chromosome chrom, and str1 and str2 provide their corresponding positioning strengths

permute_dyads

Permute called dyad coordinates to generate 'random' dyad_positions. Example:

./permute_dyads called_dyads_file=./MNase_Seq_analysis/dyad_calls/dyad_positions.txt genome_table=hg19.gt output_file=./MNase_Seq_analysis/dyad_calls/mock.dyad_positions.txt window=1000

diff_summaries_at_sites

Calculate nucleosome binding differences at a collection of sites (e.g. TF binding sites). The resulting file can be then smoothed using estimate_peaks module. Example:

diff_summaries_at_sites dyad_match_file=./dyad_matching/dyad_matches unmatched_file1=./dyad_matching/unmatched_dyads1 unmatched_file2=./dyad_matching/unmatched_dyads2 positions=sites.txt genome_table=hg19.gt output_file=sites.chrom_diff dist=4000

The output file is tab-delimited and has the following format:

<offset> <match counts> <umnatch_counts>

umnatch_counts / (2*match_counts) provides a nucleosome binding difference