waterlandlab / CluBCpG

Cluster-based analysis of CpG methylation
https://clubcpg.readthedocs.io/
MIT License
11 stars 6 forks source link

Comparative analysis for more than two libraries from two groupings? #10

Closed ben-laufer closed 4 years ago

ben-laufer commented 4 years ago

Hello, this feature request is related to an enhancement that could enable a different type of analysis using your program. I was wondering if you had a recommendation for comparing more than two libraries in a CluBCpG analysis? I'm curious to see if this tool can be used for a pairwise comparison of experimental vs control with many samples in each group. I'm thinking it may be possible to use this output as the input or replacement for a WGCNA analysis.

canthonyscott commented 4 years ago

Hi!

This was actually something we did consider.

Currently, in two-library mode, we have a requirement that a given bin must have at least 10 reads in BOTH libraries. We found out during testing, that as you add more libraries, the chance that a bin has at least 10 reads in all libraries becomes very small (unless you are using samples with very high coverage). This results in an clubcpg outputs that do not have great coverage across the full genome.

A work around for this for your problem could be to run ClubCpG on all of the libraries individually, and then merge the final results tables on the bin label column. This would allow the view of the full genome of each library, without any losses because one library may not have had 10 reads.