about fragment size - Githubissues

MinS1 commented 6 years ago

Hi Jeffrey,

I have MeDIP-seq data(single end, average reads length is 50), but I don't konw the sequencing platform and fragment length in sequencing experiment. I want to know is there any problem if I set the fragsize arguments in 120? And the defaults window size and minimum DMR size are 50 bp and 150 bp, respectivily, I want to know what conditions are used to set these values?

Thank you shishi

jeffbhasin commented 6 years ago

Hello Shishi, Thanks very much for your question. In the case where you don't know the fragment size is not ideal, but I think 120 bp would be a safe guess. I doubt it would be any less than this in a protocol and smaller size would be more of a conservative guess.

The default window size of 50bp and minimum DMR size of 150bp are defaults that came from our lab's experience. If you want to try and tune these values, I recommend trying different values than examining the filtered DMR results in a genome browser to see if the DMRs look like what you would consider as a good DMR or not.

Further justification for those defaults is that our fragment size was just under 150 bp, and we didn't want to consider any DMRs that were less than our fragment size. For the 50bp window size, we found this gave us good resolution of the edges of DMRs, and would validate when we did targeted bisulfite sequencing. However, this was for our MBD-seq protocol so we don't prescribe them in all cases, and some experimentation is a good idea for each protocol.

Jeff

MinS1 commented 6 years ago

Jeff, thank you for your detailed reply!

I have been trying different fragsize and winsize, the results show that different fragsize doesn't seem to make much difference. But the effect of different winsize on results is large. Finally, I choose fragsize 120 and winsize 100. Summary as follows:

maSummary(ma) stat count percent 1 Window Size 100 2 Total Windows 30956775 3 All Zero Windows (filtered) 4336729 14.01 4 All Below FDR Windows (filtered) 19417442 62.72 5 Signal Windows (move on to stage one) 7202604 23.27 6 Windows Tested in Stage One 7202604 7 Sig Pattern in Stage One 7202604 100 8 Non-Sig Pattern in Stage One 0 0 9 Ambig Pattern in Stage One 0 0 10 Regions Formed By Joining Adjacent Patterns 98307 11 Regions Tested in Stage Two 98307 12 Regions That Pass ANODEV 96602 98.27 13 ANODEV Sig with Sig Pattern 96602 100 14 ANODEV Sig with Non-sig Pattern 0 0 15 ANODEV Sig with Ambig Pattern 0 0 16 Total DMRs 96602 table(ma$dmr$pattern,ma$dmr$frequent) FALSE TRUE 01 7037 5617 10 81473 214 11 0 0

When I use MethylAction, I have some other problems with me. Questions as follows: 1, The ajust,var arguments in founction methylaction() seem just allowed to set one name of colum in "samp", so does that means just adjust one covariate? If I want to adjust multiple covariates(such as age, gender, PMI and so on), what should I set? 2,The ouput DMRs results are classified as "frequent" and "other", if the total number of DMRs is large(for example, close to 100,000), are only the "frequent" DMRs significant for the further analysis? 3, What's the meaning of the Bootstrap or permutation testing? From the output results of MethyAction, the significant DMRs are depended on the value of anodev.padj(<0.05) , so what the FDR value of Bootstrap or permutation testing do? as a filter threshold in further ananlysis? 4,When I run the maKaryogram() founction, it always have error "Scale for 'x' is already present. Adding another scale for 'x', which will replace the existing scale." Do you have any solutions?

Thank you Shishi

jeffbhasin commented 6 years ago

Hi Shishi,

I think adjust.var does only take one covariate. In our use cases we only had single covariates, however, there is no reason why more can't be added. It is just a matter of editing the code to vectorize that option so it can take multiples and then building the formula correctly when the code runs the model. Adding multiple adjust.var settings would have to be something we consider as a new feature.
We recommend using both "frequent" and "other" DMRs together, but using fold changes plus a threshold for cases of very low counts (that could inflate fold changes). The "frequent" is perhaps something we should remove in the future, it was an idea we had to try and filter to better DMRs but in further testing just using fold changes appears to be a better approach. The "frequent" uses a hard threshold, but some regions of the genome have higher background enrichment than others, so it turns out to not be a very robust way of looking for stronger DMRs.
The bootstrap mode does resampling, so in some iterations the same sample could be drawn twice (i.e. sampling with replacement). The permutation mode just does a shuffle of the samples among the groups (i.e. sampling without replacement). Statistically speaking, the bootstrap is more ideal, however I don't recall that we saw much of a difference in the end results. The purpose of either the bootstrap or permutation testing is to establish empirical false discovery rates (FDRs). This is important for establishing statistical significance of the DMRs due to all of the multiple testing involved in a genome-wide analysis of MBD-seq or MeDIP-seq or similar methods.
With maKaryogram() is that an error or a warning? Does it still make the plot? It sounds like something internal to ggbio(), and could possibly be ignored if the karyogram is still being produced. From the sound if it, it's relating to the plotting stage.

Jeff

MinS1 commented 6 years ago

Hi Jeff,

The error in maKaryogram as follows:

maKaryogram(ma=ma, reads=reads) Scale for 'x' is already present. Adding another scale for 'x', which will replace the existing scale. Scale for 'x' is already present. Adding another scale for 'x', which will replace the existing scale.

the karyogram plot hasn't produced.

Shishi

jeffbhasin commented 6 years ago

Are you outputting to a graphics device, i.e. saving to a PDF using pdf()?

jeffbhasin / methylaction

about fragment size #4