Negative p-values when using --broad peak flag

lopatica commented 9 years ago

I am using MACS2 to call open chromatin regions in a MNase-seq dataset. However I'm generating negative p-values after calling broad peaks using the following commands: macs2 callpeak -t treatment.s.bam -c input.bam -n tg \ --outdir ~/nucleosomedata/ -f BAMPE -g 1.87e9 --broad

Here is a few rows of output:

chr start end length pileup -LOG10(pvalue) fold_enrichment -LOG10(qvalue) name chr1 3003373 3003936 564 -1.23335E+31 -1.26269E+18 1.36546E+30 2.13806 tg_peak_1 chr1 3020520 3020966 447 -4.20326E+31 -4.30325E+18 4.65349E+30 1.28417 tg_peak_2 chr1 3029027 3030664 1638 -1.39736E+31 -1.4306E+18 1.54704E+30 1.49317 tg_peak_3 chr1 3045122 3045422 301 -5.30715E+31 -5.4334E+18 5.87561E+30 1.26046 tg_peak_4

Could this be due to defaulting to q-values when calling broad peaks?

Thanks

taoliu commented 9 years ago

It may relate to a bug in BAMPE module. Could you try it again with the new release I just uploaded today? https://pypi.python.org/pypi/MACS2/2.1.0.20150420

skurscheid commented 9 years ago

I don't think that BAMPE is the source of the error. As described in issue #67 it can also be observed when using SE BAM as input... However, just in case, I updated macs2 to the latest version and re-ran the analysis with the same parameters:

[skurscheid@gduserv G1_H2AZ_broad]$ head -25 G1_H2AZ_broad_peaks.xls
# This file is generated by MACS version 2.1.0.20150420
# Command line: callpeak -g 2652783500 -q 0.05 --keep-dup all -f BAM -c ../../G1_Input_multimapped.bam -t ../../G1_H2AZ_multimapped.bam --outdir G1_H2AZ_broad -n G1_H2AZ_broad --broad --nomodel --extsize 145
# ARGUMENTS LIST:
# name = G1_H2AZ_broad
# format = BAM
# ChIP-seq file = ['../../G1_H2AZ_multimapped.bam']
# control file = ['../../G1_Input_multimapped.bam']
# effective genome size = 2.65e+09
# band width = 300
# model fold = [5, 50]
# qvalue cutoff = 5.00e-02
# Larger dataset will be scaled towards smaller dataset.
# Range for calculating regional lambda is: 1000 bps and 10000 bps
# Broad region calling is on

# tag size is determined as 75 bps
# total tags in treatment: 32641366
# total tags in control: 16985401
# d = 145
chr start   end length  pileup  -log10(pvalue)  fold_enrichment -log10(qvalue)  name
1   3532282 3532522 241 -51010436366211418144575842156544.00    -5222391610133708800.00000  5647434102187120228922816462848.00000   0.84887 G1_H2AZ_broad_peak_1
1   3670329 3670573 245 -21445205231841444578604989022208.00    -2195536016985554944.00000  2374227551028464455643079639040.00000   2.67962 G1_H2AZ_broad_peak_2
1   3672050 3672449 400 -14800211482489477710623274958848.00    -1515229002778804224.00000  1638551459537636861554907414528.00000   1.07141 G1_H2AZ_broad_peak_3
1   3841400 3841544 145 -80831923506565104656709163417600.00    -8275481963037458432.00000  8949011120017582312359970471936.00000   1.03685 G1_H2AZ_broad_peak_4
1   3873520 3874441 922 -25139114341448532460794950975488.00    -2573714553187598336.00000  2783185395139340442121034268672.00000   1.02883 G1_H2AZ_broad_peak_5

taoliu commented 9 years ago

Hi lopatica and skurscheid,

Do any of you have some small dataset that I can use to reproduce the error?

Best, Tao

lopatica2 commented 9 years ago

taoliu

I'd be happy to pass along a dataset. But my test and control bams are all ~1.2 gigs. I could sub sample them for just a few chromosomes?

lopatica (now lopatica2)

taoliu commented 9 years ago

lopatica/2,

Yes. Could you subsample your datasets or just send me data of one chromosome, AS LONG AS the issue can be reproduced?

Thanks a lot, Tao

lopatica2 commented 9 years ago

taoliu

I reproduced the negative p-value error using BAMPE --broad with subsampling only chr10. What would be the preferred way to get you these files ~1 g.

Thanks!

skurscheid commented 9 years ago

Hi Tao, I will sub sample from the BAM files and check if the error re-produces. If that's the case I am happy to share them with you. Cheers, Sebastian

On Wed, Apr 22, 2015 at 1:43 PM -0700, "Tao Liu (τν)" notifications@github.com wrote:

Hi lopatica and skurscheid,

Do any of you have some small dataset that I can use to reproduce the error?

Best,

Tao

— Reply to this email directly or view it on GitHub.

lopatica2 commented 9 years ago

Hello,

I have data subsampled and the error reproduced. The combined files are ~1 g. I could downsample further if that is too large.

Best GDJ

On Thu, Apr 23, 2015 at 7:27 PM, Sebastian Kurscheid < notifications@github.com> wrote:

Hi Tao, I will sub sample from the BAM files and check if the error re-produces. If that's the case I am happy to share them with you. Cheers, Sebastian

On Wed, Apr 22, 2015 at 1:43 PM -0700, "Tao Liu (τν)" < notifications@github.com> wrote:

Hi lopatica and skurscheid,

Do any of you have some small dataset that I can use to reproduce the error?

Best,

Tao

— Reply to this email directly or view it on GitHub.

— Reply to this email directly or view it on GitHub https://github.com/taoliu/MACS/issues/74#issuecomment-95746629.

Graham D Johnson Graduate Research Assistant Center for Molecular Medicine and Genetics Wayne State University School of Medicine Detroit MI 48201 USA

Phone: 313-577-0765 FAX: 313-577-8554

taoliu commented 9 years ago

As long as you can share the file with me. For example through Dropbox link. If you can down sample to 100Mbytes file but still can reproduce the error, that would be tremendously helpful!

Tao

On Apr 23, 2015, at 7:29 PM, lopatica2 notifications@github.com wrote:

Hello,

I have data subsampled and the error reproduced. The combined files are ~1 g. I could downsample further if that is too large.

Best GDJ

On Thu, Apr 23, 2015 at 7:27 PM, Sebastian Kurscheid < notifications@github.com> wrote:

Hi Tao, I will sub sample from the BAM files and check if the error re-produces. If that's the case I am happy to share them with you. Cheers, Sebastian

On Wed, Apr 22, 2015 at 1:43 PM -0700, "Tao Liu (τν)" < notifications@github.com> wrote:

Hi lopatica and skurscheid,

Do any of you have some small dataset that I can use to reproduce the error?

Best,

Tao

— Reply to this email directly or view it on GitHub.

— Reply to this email directly or view it on GitHub https://github.com/taoliu/MACS/issues/74#issuecomment-95746629.

Graham D Johnson Graduate Research Assistant Center for Molecular Medicine and Genetics Wayne State University School of Medicine Detroit MI 48201 USA

Phone: 313-577-0765 FAX: 313-577-8554 — Reply to this email directly or view it on GitHub.

skurscheid commented 9 years ago

Tao,

I have sub-sampled both input and ChIP data. Running macs2 with the same parameters re-produces the error. However, this time I also noticed an error message which I did not see before (I was running a bash script for peak calling on several samples and did not inspect the log files as output was produced and no errors were reported in the error file (of my bash script)). The error is pasted below:

# Command line: callpeak -t ../../DT_H2A.Z.sorted.sub.bam -c ../../DT_Input.sorted.sub.bam --name DT_H2AZ_broad --outdir DT_H2AZ_broad --broad --qvalue 0.05
# ARGUMENTS LIST:
# name = DT_H2AZ_broad
# format = AUTO
# ChIP-seq file = ['../../DT_H2A.Z.sorted.sub.bam']
# control file = ['../../DT_Input.sorted.sub.bam']
# effective genome size = 2.70e+09
# band width = 300
# model fold = [5, 50]
# qvalue cutoff = 5.00e-02
# Larger dataset will be scaled towards smaller dataset.
# Range for calculating regional lambda is: 1000 bps and 10000 bps
# Broad region calling is on

INFO  @ Fri, 24 Apr 2015 11:29:43: #1 read tag files... 
INFO  @ Fri, 24 Apr 2015 11:29:43: #1 read treatment tags... 
Exception ZeroDivisionError: 'integer division or modulo by zero' in 'MACS2.IO.Parser.GenericParser.tsize' ignored
Exception ZeroDivisionError: 'integer division or modulo by zero' in 'MACS2.IO.Parser.GenericParser.tsize' ignored
INFO  @ Fri, 24 Apr 2015 11:29:43: Detected format is: SAM 
INFO  @ Fri, 24 Apr 2015 11:29:45: #1.2 read input tags... 
Exception ZeroDivisionError: 'integer division or modulo by zero' in 'MACS2.IO.Parser.GenericParser.tsize' ignored
Exception ZeroDivisionError: 'integer division or modulo by zero' in 'MACS2.IO.Parser.GenericParser.tsize' ignored
INFO  @ Fri, 24 Apr 2015 11:29:45: Detected format is: SAM 
INFO  @ Fri, 24 Apr 2015 11:29:48: #1 tag size is determined as 75 bps 
INFO  @ Fri, 24 Apr 2015 11:29:48: #1 tag size = 75 
INFO  @ Fri, 24 Apr 2015 11:29:48: #1  total tags in treatment: 415580 
INFO  @ Fri, 24 Apr 2015 11:29:48: #1 user defined the maximum tags... 
INFO  @ Fri, 24 Apr 2015 11:29:48: #1 filter out redundant tags at the same location and the same strand by allowing at most 1 tag(s) 
INFO  @ Fri, 24 Apr 2015 11:29:48: #1  tags after filtering in treatment: 408528 
INFO  @ Fri, 24 Apr 2015 11:29:48: #1  Redundant rate of treatment: 0.02 
INFO  @ Fri, 24 Apr 2015 11:29:48: #1  total tags in control: 432322 
INFO  @ Fri, 24 Apr 2015 11:29:48: #1 user defined the maximum tags... 
INFO  @ Fri, 24 Apr 2015 11:29:48: #1 filter out redundant tags at the same location and the same strand by allowing at most 1 tag(s) 
INFO  @ Fri, 24 Apr 2015 11:29:48: #1  tags after filtering in control: 419127 
INFO  @ Fri, 24 Apr 2015 11:29:48: #1  Redundant rate of control: 0.03 
INFO  @ Fri, 24 Apr 2015 11:29:48: #1 finished! 
INFO  @ Fri, 24 Apr 2015 11:29:48: #2 Build Peak Model... 
INFO  @ Fri, 24 Apr 2015 11:29:48: #2 looking for paired plus/minus strand peaks... 
INFO  @ Fri, 24 Apr 2015 11:29:50: #2 number of paired peaks: 9777 
INFO  @ Fri, 24 Apr 2015 11:29:50: start model_add_line... 
INFO  @ Fri, 24 Apr 2015 11:29:50: start X-correlation... 
INFO  @ Fri, 24 Apr 2015 11:29:50: end of X-cor 
INFO  @ Fri, 24 Apr 2015 11:29:50: #2 finished! 
INFO  @ Fri, 24 Apr 2015 11:29:50: #2 predicted fragment length is 299 bps 
INFO  @ Fri, 24 Apr 2015 11:29:50: #2 alternative fragment length(s) may be 299 bps 
INFO  @ Fri, 24 Apr 2015 11:29:50: #2.2 Generate R script for model : DT_H2AZ_broad/DT_H2AZ_broad_model.r 
INFO  @ Fri, 24 Apr 2015 11:29:50: #3 Call peaks... 
INFO  @ Fri, 24 Apr 2015 11:29:50: #3 Call broad peaks with given level1 -log10qvalue cutoff and level2: 1.301030, 1.000000... 
INFO  @ Fri, 24 Apr 2015 11:29:50: #3 Pre-compute pvalue-qvalue table... 
INFO  @ Fri, 24 Apr 2015 11:29:53: #3 Call peaks for each chromosome... 
INFO  @ Fri, 24 Apr 2015 11:29:54: #4 Write output xls file... DT_H2AZ_broad/DT_H2AZ_broad_peaks.xls 
INFO  @ Fri, 24 Apr 2015 11:29:54: #4 Write broad peak in broadPeak format file... DT_H2AZ_broad/DT_H2AZ_broad_peaks.broadPeak 
INFO  @ Fri, 24 Apr 2015 11:29:54: #4 Write broad peak in bed12/gappedPeak format file... DT_H2AZ_broad/DT_H2AZ_broad_peaks.gappedPeak 
INFO  @ Fri, 24 Apr 2015 11:29:54: Done!

The sub-sampled BAM files are available here:

https://cloudstor.aarnet.edu.au/plus/public.php?service=files&t=0626762a3196e967c166bb9160ac1a8d https://cloudstor.aarnet.edu.au/plus/public.php?service=files&t=187bb9ebb771484e3aa142117ebeb371

One more note: the input FASTQ reads were quality trimmed prior to alignment... perhaps that could be a possible cause of the problem?

Thanks in advance!

Sebastian

taoliu commented 9 years ago

Thank you Sebastian! I have downloaded your data and run them through macs2 with or without broad option. Here are the results:

1) with --broad

$head -30 NA_b_peaks.xls
# This file is generated by MACS version 2.1.0.20150419
# Command line: callpeak -t DT_H2A.Z.sorted.sub.sam -c DT_Input.sorted.sub.sam --broad -n NA_b -q 0.05
# ARGUMENTS LIST:
# name = NA_b
# format = AUTO
# ChIP-seq file = ['DT_H2A.Z.sorted.sub.sam']
# control file = ['DT_Input.sorted.sub.sam']
# effective genome size = 2.70e+09
# band width = 300
# model fold = [5, 50]
# qvalue cutoff = 5.00e-02
# Larger dataset will be scaled towards smaller dataset.
# Range for calculating regional lambda is: 1000 bps and 10000 bps
# Broad region calling is on
# tag size is determined as 75 bps
# total tags in treatment: 415580
# tags after filtering in treatment: 408528
# maximum duplicate tags at the same position in treatment = 1
# Redundant rate in treatment: 0.02
# total tags in control: 432322
# tags after filtering in control: 419127
# maximum duplicate tags at the same position in control = 1
# Redundant rate in control: 0.03
# d = 299
# alternative fragment length(s) may be 299 bps
chr start   end length  pileup  -log10(pvalue)  fold_enrichment -log10(qvalue)  name
1   72710700    72710998    299 0.95    2.73697 1.82221 0.38595 NA_b_peak_1
1   130714948   130716316   1369    1.03    3.06301 1.92459 0.46183 NA_b_peak_2
1   156420786   156421270   485 2.36    5.61416 3.21273 1.05878 NA_b_peak_3

2) without --broad

$head -30 NA_peaks.xls
# This file is generated by MACS version 2.1.0.20150419
# Command line: callpeak -t DT_H2A.Z.sorted.sub.sam -c DT_Input.sorted.sub.sam
# ARGUMENTS LIST:
# name = NA
# format = AUTO
# ChIP-seq file = ['DT_H2A.Z.sorted.sub.sam']
# control file = ['DT_Input.sorted.sub.sam']
# effective genome size = 2.70e+09
# band width = 300
# model fold = [5, 50]
# qvalue cutoff = 5.00e-02
# Larger dataset will be scaled towards smaller dataset.
# Range for calculating regional lambda is: 1000 bps and 10000 bps
# Broad region calling is off
# tag size is determined as 75 bps
# total tags in treatment: 415580
# tags after filtering in treatment: 408528
# maximum duplicate tags at the same position in treatment = 1
# Redundant rate in treatment: 0.02
# total tags in control: 432322
# tags after filtering in control: 419127
# maximum duplicate tags at the same position in control = 1
# Redundant rate in control: 0.03
# d = 299
# alternative fragment length(s) may be 299 bps
chr start   end length  abs_summit  pileup  -log10(pvalue)  fold_enrichment -log10(qvalue)  name
1   72710700    72710998    299 72710815    4.00    8.27238 4.72461 2.59025 NA_peak_1
1   195241669   195241990   322 195241743   17.00   16.40707    9.11525 9.79023 NA_peak_2
11  3127086 3127451 366 3127362 6.00    9.06991 5.95814 3.04630 NA_peak_3

I haven't seen strange behaviors of negative p or q values. I have tested on a Mac and a Linux server with the same parameters and the same version of MACS2 '2.1.0.20150419'. Could you check if you get the similar or different results than what I posted here? Or maybe it's due to an installation problem. If so, try to remove your current MACS2 installation entirely from your computer, then re-install.

Best, Tao

lopatica2 commented 9 years ago

Tao,

The problem I am having is that I get negative p-values when running callpeaks with the BAMPE and BROAD settings. If I do not use the BROAD flag everything works fine.

Here are the commands I used after down-sampling my data to include only chr10 :

macs2 callpeak \ -t treatment_chr10.bam \ -c control_chr10.bam \ -n test \ --outdir ~/nucleosomedata/wt/nucleoatac/macs2/ -f BAMPE -g 1.87e9 --broad

The bam files and output files can be found here:

https://www.dropbox.com/s/62t0h7f2llenw15/Subsample.rar?dl=0

Thank you! Graham

On Fri, Apr 24, 2015 at 10:33 AM, Tao Liu (τν) notifications@github.com wrote:

Thank you Sebastian! I have downloaded your data and run them through macs2 with or without broad option. Here are the results:

1) with --broad

$head -30 NA_b_peaks.xls This file is generated by MACS version 2.1.0.20150419 Command line: callpeak -t DT_H2A.Z.sorted.sub.sam -c DT_Input.sorted.sub.sam --broad -n NA_b -q 0.05 ARGUMENTS LIST: name = NA_b format = AUTO ChIP-seq file = ['DT_H2A.Z.sorted.sub.sam'] control file = ['DT_Input.sorted.sub.sam'] effective genome size = 2.70e+09 band width = 300 model fold = [5, 50] qvalue cutoff = 5.00e-02 Larger dataset will be scaled towards smaller dataset. Range for calculating regional lambda is: 1000 bps and 10000 bps Broad region calling is on tag size is determined as 75 bps total tags in treatment: 415580 tags after filtering in treatment: 408528 maximum duplicate tags at the same position in treatment = 1 Redundant rate in treatment: 0.02 total tags in control: 432322 tags after filtering in control: 419127 maximum duplicate tags at the same position in control = 1 Redundant rate in control: 0.03 d = 299 alternative fragment length(s) may be 299 bps

chr start end length pileup -log10(pvalue) fold_enrichment -log10(qvalue) name 1 72710700 72710998 299 0.95 2.73697 1.82221 0.38595 NA_b_peak_1 1 130714948 130716316 1369 1.03 3.06301 1.92459 0.46183 NA_b_peak_2 1 156420786 156421270 485 2.36 5.61416 3.21273 1.05878 NA_b_peak_3

2) without --broad

$head -30 NA_peaks.xls This file is generated by MACS version 2.1.0.20150419 Command line: callpeak -t DT_H2A.Z.sorted.sub.sam -c DT_Input.sorted.sub.sam ARGUMENTS LIST: name = NA format = AUTO ChIP-seq file = ['DT_H2A.Z.sorted.sub.sam'] control file = ['DT_Input.sorted.sub.sam'] effective genome size = 2.70e+09 band width = 300 model fold = [5, 50] qvalue cutoff = 5.00e-02 Larger dataset will be scaled towards smaller dataset. Range for calculating regional lambda is: 1000 bps and 10000 bps Broad region calling is off tag size is determined as 75 bps total tags in treatment: 415580 tags after filtering in treatment: 408528 maximum duplicate tags at the same position in treatment = 1 Redundant rate in treatment: 0.02 total tags in control: 432322 tags after filtering in control: 419127 maximum duplicate tags at the same position in control = 1 Redundant rate in control: 0.03 d = 299 alternative fragment length(s) may be 299 bps

chr start end length abs_summit pileup -log10(pvalue) fold_enrichment -log10(qvalue) name 1 72710700 72710998 299 72710815 4.00 8.27238 4.72461 2.59025 NA_peak_1 1 195241669 195241990 322 195241743 17.00 16.40707 9.11525 9.79023 NA_peak_2 11 3127086 3127451 366 3127362 6.00 9.06991 5.95814 3.04630 NA_peak_3

I haven't seen strange behaviors of negative p or q values. I have tested on a Mac and a Linux server with the same parameters and the same version of MACS2 '2.1.0.20150419'. Could you check if you get the similar or different results than what I posted here? Or maybe it's due to an installation problem. If so, try to remove your current MACS2 installation entirely from your computer, then re-install.

Best, Tao

— Reply to this email directly or view it on GitHub https://github.com/taoliu/MACS/issues/74#issuecomment-95949735.

Graham D Johnson Graduate Research Assistant Center for Molecular Medicine and Genetics Wayne State University School of Medicine Detroit MI 48201 USA

Phone: 313-577-0765 FAX: 313-577-8554

taoliu commented 9 years ago

Thank you Graham! I have tested your files. But I can't see the negative values. Here is the output:

$head -30 test_TL_peaks.xls
# This file is generated by MACS version 2.1.0.20150420
# Command line: callpeak -t treatment_chr10.bam -c control_chr10.bam -n test_TL -f BAMPE --broad -g 1.87e9
# ARGUMENTS LIST:
# name = test_TL
# format = BAMPE
# ChIP-seq file = ['treatment_chr10.bam']
# control file = ['control_chr10.bam']
# effective genome size = 1.87e+09
# band width = 300
# model fold = [5, 50]
# qvalue cutoff = 5.00e-02
# Larger dataset will be scaled towards smaller dataset.
# Range for calculating regional lambda is: 1000 bps and 10000 bps
# Broad region calling is on

# fragment size is determined as 158 bps
# total fragments in treatment: 6398698
# fragments after filtering in treatment: 6029452
# maximum duplicate fragments in treatment = 1
# Redundant rate in treatment: 0.06
# total fragments in control: 3049299
# fragments after filtering in control: 2763758
# maximum duplicate fragments in control = 1
# Redundant rate in control: 0.09
# d = 158
chr start   end length  pileup  -log10(pvalue)  fold_enrichment -log10(qvalue)  name
chr10   3100163 3101638 1476    3.53    0.97813 1.09051 0.56246 test_TL_peak_1
chr10   3104811 3106730 1920    8.39    1.53059 1.29429 0.94507 test_TL_peak_2
chr10   3117445 3121577 4133    10.09   2.34625 1.38336 1.62322 test_TL_peak_3
chr10   3127127 3127503 377 18.78   1.76995 1.10285 1.14222 test_TL_peak_4

I know it looks different with the file you included in your rar package. So I suggest you remove the current MACS2 from your computer and re-install again through PyPI: https://pypi.python.org/pypi/MACS2/2.1.0.20150420

You can do:

$ pip install MACS2

BTW, @skurscheid the results I posted for you contain '2.1.0.20150419' which is different with the formal release '2.1.0.20150420' since the version on my Mac is a bitter (1day) older. After I updated that to the formal release , I got the same results -- no negative values. And the version on my linux server is '2.1.0.20150420' and no negative values there either.

Tao

taoliu commented 9 years ago

Also, @lopatica2 and @skurscheid , sometimes, it may be caused by the differences of C compilers on Linux and Windows. FYI, I only tested MACS2 on Mac (LLVM) and Linux/Unix (gcc). Windows OS is not suitable for doing bioinformatics...

lopatica2 commented 9 years ago

Tao,

It is interesting that only the pile-up and p-value columns differ between the outputs. I tested this on macs2 2.1.0.20150420. But I had it installed by our network team as I couldn't get PyPI to work. Should the different routes to installing the same version has an impact?

Thanks again Graham

On Fri, Apr 24, 2015 at 11:22 AM, Tao Liu (τν) notifications@github.com wrote:

Thank you Graham! I have tested your files. But I can't see the negative values. Here is the output:

$head -30 test_TL_peaks.xls

This file is generated by MACS version 2.1.0.20150420

Command line: callpeak -t treatment_chr10.bam -c control_chr10.bam -n test_TL -f BAMPE --broad -g 1.87e9

ARGUMENTS LIST:

name = test_TL

format = BAMPE

ChIP-seq file = ['treatment_chr10.bam']

control file = ['control_chr10.bam']

effective genome size = 1.87e+09

band width = 300

model fold = [5, 50]

qvalue cutoff = 5.00e-02

Larger dataset will be scaled towards smaller dataset.

Range for calculating regional lambda is: 1000 bps and 10000 bps

Broad region calling is on

fragment size is determined as 158 bps

total fragments in treatment: 6398698

fragments after filtering in treatment: 6029452

maximum duplicate fragments in treatment = 1

Redundant rate in treatment: 0.06

total fragments in control: 3049299

fragments after filtering in control: 2763758

maximum duplicate fragments in control = 1

Redundant rate in control: 0.09

d = 158

chr start end length pileup -log10(pvalue) fold_enrichment -log10(qvalue) name chr10 3100163 3101638 1476 3.53 0.97813 1.09051 0.56246 test_TL_peak_1 chr10 3104811 3106730 1920 8.39 1.53059 1.29429 0.94507 test_TL_peak_2 chr10 3117445 3121577 4133 10.09 2.34625 1.38336 1.62322 test_TL_peak_3 chr10 3127127 3127503 377 18.78 1.76995 1.10285 1.14222 test_TL_peak_4

I know it looks different with the file you included in your rar package. So I suggest you remove the current MACS2 from your computer and re-install again through PyPI: https://pypi.python.org/pypi/MACS2/2.1.0.20150420

You can do:

$ pip install MACS2

BTW, @skurscheid https://github.com/skurscheid the results I posted for you contain '2.1.0.20150419' which is different with the formal release '2.1.0.20150420' since the version on my Mac is a bitter (1day) older. After I updated that to the formal release , I got the same results -- no negative values. And the version on my linux server is '2.1.0.20150420' and no negative values there either.

Tao

— Reply to this email directly or view it on GitHub https://github.com/taoliu/MACS/issues/74#issuecomment-95965152.

Graham D Johnson Graduate Research Assistant Center for Molecular Medicine and Genetics Wayne State University School of Medicine Detroit MI 48201 USA

Phone: 313-577-0765 FAX: 313-577-8554

lopatica2 commented 9 years ago

We use gcc on our campus.

On Fri, Apr 24, 2015 at 11:26 AM, Tao Liu (τν) notifications@github.com wrote:

Also, @lopatica2 https://github.com/lopatica2 and @skurscheid https://github.com/skurscheid , sometimes, it may be caused by the differences of C compilers on Linux and Windows. FYI, I only tested MACS2 on Mac (LLVM) and Linux/Unix (gcc). Windows OS is not suitable for doing bioinformatics...

— Reply to this email directly or view it on GitHub https://github.com/taoliu/MACS/issues/74#issuecomment-95965849.

Graham D Johnson Graduate Research Assistant Center for Molecular Medicine and Genetics Wayne State University School of Medicine Detroit MI 48201 USA

Phone: 313-577-0765 FAX: 313-577-8554

taoliu commented 9 years ago

@lopatica2 As a general recommendation to all people processing their own data on Linux/Unix and wondering how to control bioinformatics tools by themselves, the 'virtualenv' tool will help you. Google 'virtualenv' (https://virtualenv.pypa.io/en/latest/), there is a simple python script you can use, to make a virtual python environment isolated from the one for whole system and fully controlled by yourself. By activating it, your current environmental variables will be changed so that any software, not only python software, will be installed into this isolated environment. No need to ask administrators anymore. So you will have full control of installing, removing, updating tools. You may be able to find some good tutorial on virtualenv on the web.

lopatica2 commented 9 years ago

Great to know! Thank you again.

On Fri, Apr 24, 2015 at 1:51 PM, Tao Liu (τν) notifications@github.com wrote:

@lopatica2 https://github.com/lopatica2 As a general recommendation to all people processing their own data on Linux/Unix and wondering how to control bioinformatics tools by themselves, the 'virtualenv' tool will help you. Google 'virtualenv' (https://virtualenv.pypa.io/en/latest/), there is a simple python script you can use, to make a virtual python environment isolated from the one for whole system and fully controlled by yourself. By activating it, your current environmental variables will be changed so that any software, not only python software, will be installed into this isolated environment. No need to ask administrators anymore. So you will have full control of installing, removing, updating tools. You may be able to find some good tutorial on virtualenv on the web.

— Reply to this email directly or view it on GitHub https://github.com/taoliu/MACS/issues/74#issuecomment-96011981.

Graham D Johnson Graduate Research Assistant Center for Molecular Medicine and Genetics Wayne State University School of Medicine Detroit MI 48201 USA

Phone: 313-577-0765 FAX: 313-577-8554

taoliu commented 9 years ago

@lopatica2 @skurscheid Please do let me know if the issue remains or not after you clean your computational environment and reinstall MACS2. Thanks!

skurscheid commented 9 years ago

@taoliu we use virtualenv on our cluster (centos 6.x I believe). This is really strange and I can only assume that it is an issue with one of the required libraries. Is it possible to do something like "session.info()" for Python to compare the versions of libraries in the environment? (I'm more of an R person...) however I will also do a fresh install in a different virtualenv plus on my Mac and re-run the analysis. Thank you for your support!

skurscheid commented 9 years ago

@taoliu I have installed macs2 on my Mac, and now I am getting results identical to yours (as you posted above). I will try to narrow down what causes the bug on our cluster. Thank you for your support!

taoliu commented 9 years ago

@skurscheid Do you have updates? If the issue was caused by the differences between platform/OS, the best practice might be to use pyx files on github then translate them to C using local Cython which better matches the local C compiler. The MACS2 package on PyPI site only contains C codes.

skurscheid commented 9 years ago

Sorry for the late reply.

It seems to be a platform-specific problem, as the error does not occur when running the same version of MACS2 on Mac OS X 10.10. I will try your suggestion. For now you can close the issue.

On 31 Jul 2015, at 04:11, Tao Liu (τν) notifications@github.com wrote:

@skurscheid https://github.com/skurscheid Do you have updates? If the issue was caused by the differences between platform/OS, the best practice might be to use pyx files on github then translate them to C using local Cython which better matches the local C compiler. The MACS2 package on PyPI site only contains C codes.

— Reply to this email directly or view it on GitHub https://github.com/taoliu/MACS/issues/74#issuecomment-126467715.

taoliu commented 9 years ago

@skurscheid Thanks for the feedback!

macs3-project / MACS

Negative p-values when using --broad peak flag #74

This file is generated by MACS version 2.1.0.20150420

Command line: callpeak -t treatment_chr10.bam -c control_chr10.bam -n test_TL -f BAMPE --broad -g 1.87e9

ARGUMENTS LIST:

name = test_TL

format = BAMPE

ChIP-seq file = ['treatment_chr10.bam']

control file = ['control_chr10.bam']

effective genome size = 1.87e+09

band width = 300

model fold = [5, 50]

qvalue cutoff = 5.00e-02

Larger dataset will be scaled towards smaller dataset.

Range for calculating regional lambda is: 1000 bps and 10000 bps

Broad region calling is on

fragment size is determined as 158 bps

total fragments in treatment: 6398698

fragments after filtering in treatment: 6029452

maximum duplicate fragments in treatment = 1

Redundant rate in treatment: 0.06

total fragments in control: 3049299

fragments after filtering in control: 2763758

maximum duplicate fragments in control = 1

Redundant rate in control: 0.09

d = 158