Compare annotation percentages

IRECG commented 5 years ago

Hello again, I have performed MBD-seq expermients with two groups: cases and controls. After annotating the DMRs I have the distribution showed below in the plot. I have compare the proportions of each annotation between groups, assuming that proportion of each one should be more or less equal between groups. I do have statistical significant differences between them in all the categories, that I can explain by the biological differences between the groups. But I also wonder if it would be possible to compare to a "expected" distribution. I have been reading the manual and the paper and I don't know if the function drawGenomePool does something similar or it there is any other way to do it. Thanks

jeffbhasin commented 5 years ago

Hi Irene, Yes it is possible to add an expected/null distribution expectation to an annotation bar plot. The procedure would be to use drawGenomePool() to obtain a GenomicRanges object of a background set. Then, run this null set through goldmine() using the same settings as for the DMR data. The proportions that come from this annotation can form the "genomic background" group that can be added to the plot. This is what we've done in our own studies. It is possible to do a statistical test, such as binom.test() or a Fisher's exact test as well.

Jeff

IRECG commented 5 years ago

Hi Jeff, I've tried to do drawGenomePool() and I have a question about the query I have to use. Is the one with the total of my DMRs (including hypermethylated in controls and in cases)? Or should I use two, with the lenght of each one? I mean, if I do it with my "total" query I am not really comparing similar lenghts, because the number of DMRs I have in the two patterns is quite different. Irene

jeffbhasin commented 5 years ago

It would be possible to have two nulls, one for the hypo- DMRs and one for the hyper-DMRs. This could get confusing when plotted on a bar graph because it would need 4 bars. One way to plot then is to show the enrichment of each query set (hyper or hypo DMRs) over it's own respective null as a fold change or odds ratio.

However, I generally have just combined both of my hyper- and hypo- DMRs and used that as a the genomic null set. The reason is I tried with them separate and didn't see a difference. The genomic background was sampled either way, regardless of changes in length distribution. I also found my hyper- and hypo- DMRs generally had the same length distribution, even if the total number of DMRs was different. Thus, it may be valid and simpler to just treat all DMRs together to generate a genomic null that can be compared to both hyper- and hypo- DMRs.

IRECG commented 5 years ago

Thanks for your answer. So when you talk about length is not the number of DMRs but the width of them? I'm not sure I've understood it properly And also I want to know if I can present this proportions as the expected ones...

jeffbhasin commented 5 years ago

I am talking about the distribution of the lengths of the DMRs rather than the count of them. For the null set, it does not have to have the same number of DMRs, but the same length distribution. Ideally, it would have more so we've well sampled the background space. However, we want it to match the length distribution of the query so we're looking at a background of what random regions with this same length distribution look like all over the genome to compare to.

On Wed, Jul 24, 2019 at 11:17 PM IRECG notifications@github.com wrote:

Thanks for your answer. So when you talk about length is not the number of DMRs but the width of them? I'm not sure I've understood it properly

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/jeffbhasin/goldmine/issues/11?email_source=notifications&email_token=AAZF3DDNUFVR2NXLGXNWZCTQBFAPZA5CNFSM4IDYDQ4KYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOD2YPCYQ#issuecomment-514912610, or mute the thread https://github.com/notifications/unsubscribe-auth/AAZF3DDK3MI4F6VGGAHDKVTQBFAPZANCNFSM4IDYDQ4A .

-- Jeffrey M. Bhasin, PhD Bioinformatics & Data Science (949) 424-5343 5405 Alton Parkway Ste. A-210, Irvine, CA, 92604 web: http://jeffb.io/ eml: jeff@jeffb.io Add Me on LinkedIn http://linkedin.com/in/jeffrey-bhasin | Follow Me on GitHub http://github.com/jeffbhasin

jeffbhasin / goldmine

Compare annotation percentages #11