waterlandlab / PReLIM

Python package for imputing missing CpG methylation data at the read level
MIT License
3 stars 1 forks source link

Share code to generate methylation matrices for PReLIM #2

Open pelutz opened 2 years ago

pelutz commented 2 years ago

Hi,

Together with @mohamadysn (who recently got in touch for the small bug about bin size), we're currently using CluBCpG to compare EMseq data for 2 groups of 6 replicates each (using the single-library mode & no filtering on bin coverage, similar to what @canthonyscott and @ben-laufer had discussed in an old issue).

On another note, we'd be very interested in testing PReLIM. If I'm not mistaken, the generation of methylation matrices for each bin (where CGs are columns and individual reads are rows) is not yet implemented in the released version of PReLIM? Do you think it would be possible for you to share the scripts or strategy that you used? NB: We're using Bismark for alignment.

Thanks in advance, PE

canthonyscott commented 2 years ago

Tagging @robwaterland and @cjgunase to make sure they see this

canthonyscott commented 2 years ago

@pelutz I am not sure if this is exactly what you are asking for, but it might be worth checking out the clubcpg Imputation class: https://clubcpg.readthedocs.io/en/v0.2.5/API.html#clubcpg.Imputation.Imputation

It contains some methods which interact with PReLIM that may be doing similar things to what you mentioned above. It might be close to what you are asking for.

Edit: Link to the source code added below https://clubcpg.readthedocs.io/en/v0.2.5/_modules/clubcpg/Imputation.html#Imputation https://github.com/waterlandlab/CluBCpG/blob/master/clubcpg/Imputation.py

pelutz commented 2 years ago

Absolutely. I had not seen this, sorry. We'll give it a try and will keep you posted, thanks!