waterlandlab / CluBCpG

Cluster-based analysis of CpG methylation
https://clubcpg.readthedocs.io/
MIT License
10 stars 6 forks source link

CluBCpG for RRBS data? #15

Open AndriesDeKoker opened 3 years ago

AndriesDeKoker commented 3 years ago

Is your feature request related to a problem? Please describe. Correct me if I am mistaken, but CluBCpG will probably not work optimally on RRBS data because of the clubcpg-coverage --bin_size parameter? _nreads = Number of reads which fully cover all CpGs within the bin

Describe the solution you'd like I don't know for sure, but a solution could be that instead of the bin_size parameter a bin-file (per chromosome) could be passed with the expected RRBS-bins (depending on Restriction Enzyme - in most cases MspI, ordered Illumina read length).

Describe alternatives you've considered /

Additional context /

AndriesDeKoker commented 3 years ago

Or would this be possible using: usage: clubcpg-cluster [-h] [-a INPUT_BAM_A] [-b INPUT_BAM_B] [--bins BINS] [-o OUTPUT_DIR] [--bin_size BIN_SIZE] [-m CLUSTER_MEMBER_MINIMUM] [-r READ_DEPTH] [-n NUM_PROCESSORS] [--read1_5 READ1_5] [--read1_3 READ1_3] [--read2_5 READ2_5] [--read2_3 READ2_3] [--no_overlap [NO_OVERLAP]] [--remove_noise [REMOVE_NOISE]] [--suffix SUFFIX] [--permute [PERMUTE]]

Specifying the wanted RRBS-bins under --bins? Still --bin_size could be problematic?

canthonyscott commented 2 years ago

To follow up on this: clubcpg has not been tested on RRBS data and this is not something I will be able to do. This repo is open-source so if anybody wants to test and add documentation or adapt it to work with RRBS, I am happy to accept a pull request.