macs3-project / MACS

MACS -- Model-based Analysis of ChIP-Seq
https://macs3-project.github.io/MACS/
BSD 3-Clause "New" or "Revised" License
698 stars 268 forks source link

Question about calling differential binding events #156

Open toxin08x opened 7 years ago

toxin08x commented 7 years ago

Hi there,

Macs2 is a great software and I used macs2 bdgdiff to call differential binding events of RNA polymerase ii following this link: https://github.com/taoliu/MACS/wiki/Call-differential-binding-events

I have two questions.I got three output files including diff_c1_vs_c2_c3.0_common.bed, diff_c1_vs_c2_c3.0_cond1.bed and diff_c1_vs_c2_c3.0_cond2.bed. But all the values in the last column of diff_c1_vs_c2_c3.0_common.bed are more than 0. As that link mentioned, the values are in log10 likelihood ratios, so some values should be negative value unless all the binding regions in this file enriched in condition 1 comparing to condition 2. In my observation, most binding regions are enriched in condition2 and most values are less than 1. In this case, I guess these values might be not in log10 likelihood ratios. Are these values really in log10?

After I get these values, I would like to do pathway analysis. I will annotate these regions by HOMER to obtain gene names. Then if these values are not in log10, is it okay to use log10(1/cond1_value), log10(1/common_value) and log10(cond2_value) with their gene names to do the pathway analysis?

Thanks,

Yunpeng

toxin08x commented 7 years ago

The command I used for calling differential binding events:

macs2 callpeak -B -t D567_RNApolII_Control.mapped.rgid.sorted.filtered.dups_remove.bam -c D567_Input_Control.mapped.rgid.sorted.filtered.dups_remove.bam -n RNApolII_CT --nomodel --gsize 2.7e9 --tsize 67 --extsize 182 --keep-dup all --qvalue 0.05

macs2 callpeak -B -t D567_RNApolII_RT.mapped.rgid.sorted.filtered.dups_remove.bam -c D567_Input_RT.mapped.rgid.sorted.filtered.dups_remove.bam -n RNApolII_RT --nomodel --gsize 2.7e9 --tsize 67 --extsize 182 --keep-dup all --qvalue 0.05

macs2 bdgdiff --t1 RNApolII_CT_treat_pileup.bdg --c1 RNApolII_CT_control_lambda.bdg --t2 RNApolII_RT_treat_pileup.bdg --c2 RNApolII_RT_control_lambda.bdg --d1 15849561 --d2 16226600 -g 60 -l 182 --o-prefix diff_c1_vs_c2

taoliu commented 7 years ago

Hi @toxin08x, the values reported in bdgdiff are log10 likelihood ratios indicating that a condition has higher signal than the other. Higher values in cond1.bed indicate higher signals in cond1 compared with cond2; and higher values in cond2.bed indicate higher signals in cond2 compared with cond1. They are all in log10 form. In your case, values in cond1 and cond2 should be larger than 3 (or likelihood ratio 1000), and values in common file should be smaller than 3. But log10 likelihood ratios we reported in common.bed are always non-negative -- they are the maximum of log10LR(cond1 vs cond2) and log10LR(cond2 vs cond1).

catch-minie commented 7 years ago

I have a similar kind of question, the values in the last columns of all the three bed files are negative , something like this: chr2 32753628 32753756 diff_S1_vs_S2_cond2_29 -27338322477056.00000 chr2 97264021 97264317 diff_S1_vs_S2_cond2_33 -11821977174016.00000 chr2 75189710 75189870 diff_S1_vs_S2_cond2_36 -71413604024320.00000 chr2 89064223 89064451 diff_S1_vs_S2_cond2_37 -49988969168896.00000 chr2 140454180 140454312 diff_S1_vs_S2_cond2_38 -28919239540736.00000

is it usual to have such a high negative value? what does this states? and I have used the command mentioned in here: https://github.com/taoliu/MACS/wiki/Call-differential-binding-events/

taoliu commented 7 years ago

@kminie This must be an overflow issue... Could you confirm that MACS2 was compiled as a 64bit software in your computer (also, python and Numpy)?

catch-minie commented 7 years ago

yes it was compiled dat way

marins51 commented 7 years ago

Please,

 I have the same issue and my instalation was 64 bit...any further comments?

Best, Mozart

Faitero commented 6 years ago

Dear TaoLiu,

I'm experiencing the same issue in one of my samples. All my ChIPs peak calling works well identifying differential peaks between the conditions. But in one of them I'm getting negative values:

track name="condition 1 (peaks)" description="unique regions in condition 1" visibility=1

1   877804  878060  diff_Ca_Ser2_36h_vs_si298_Ser2_36h_cond1_1  -13469341466566048342893133824.00000
1   901320  901648  diff_Ca_Ser2_36h_vs_si298_Ser2_36h_cond1_2  -13244853150478253300958363648.00000
1   901763  902258  diff_Ca_Ser2_36h_vs_si298_Ser2_36h_cond1_3  -6650135060577364293770543104.00000
1   929861  930527  diff_Ca_Ser2_36h_vs_si298_Ser2_36h_cond1_4  -5606286644921167871976931328.00000
1   930647  934100  diff_Ca_Ser2_36h_vs_si298_Ser2_36h_cond1_5  -923790954692438483143753728.00000
1   1015532 1015885 diff_Ca_Ser2_36h_vs_si298_Ser2_36h_cond1_6  -15209400108650371536662822912.00000
1   1235763 1236064 diff_Ca_Ser2_36h_vs_si298_Ser2_36h_cond1_7  -15973691512070409266273452032.00000
1   1248735 1248858 diff_Ca_Ser2_36h_vs_si298_Ser2_36h_cond1_8  -66224262210616404352557907968.00000
1   1250686 1251017 diff_Ca_Ser2_36h_vs_si298_Ser2_36h_cond1_9  -9933639567710784796365946880.00000

I'm using macs2 --version macs2 2.1.1.20160309

Any help will be really appreciated, Igor

yinyeya commented 5 years ago

As taoliu said above, it is indeed an overflow issue most probably by the incorrect-compiled-dependenies. I used macs2.1.0 and it works perfectly one year before. But recently I met the same issue. I was troubleshooting several days until I noticed that I have unmatched numpy and python from miniconda env installed several months before (macs2 was installed before by root without miniconda env). Now I fixed this issue by creating a new conda env and installing macs2 there.