nloyfer / wgbs_tools

tools for working with Bisulfite Sequencing data while preserving reads intrinsic dependencies
Other
125 stars 33 forks source link

Question based on the calculation to covert bam file methylation values into beta values #13

Closed AzlanNI closed 1 year ago

AzlanNI commented 1 year ago

Hello everyone,

I am currently thinking about to use Bam2pat to convert some .bam files into .beta files.

I am using ONT .bam files and was just asking myself how the exact calculation looks like to convert the read based information from the .bam files to beta-values.

It would be great to understand the procedure before i start using it.

Thanks a lot!

kind regards, Azlan

yonniejon commented 1 year ago

Hi,

according to the docs directory of this package:

the beta file has: for each of the NR_SITES CpG sites in the genome, it holds 2 values: the #meth and #covered. Meaning, the i'th row in the matrix corresponds to the i'th CpG site:

#meth: the number of times the i'th site (CpGi) site is observed in a methylated state.
#coverage: the total number of times i'th site (CpGi) is observed. #coverage==0 is equivalent to a missing value (NaN).

CpGi's beta value is obtained by dividing #meth / #coverage.

Does that answer your question?

AzlanNI commented 1 year ago

Hello @yonniejon,

Thanks a lot for ur reply.

Alright so the beta values in the .beta file are calculated by using the the #Methylated reads / total reads for each CpG.

i got confused with the beta values calculation for the array based methylation analysis.

Thank you again!

Kind regards,

Azlan

yonniejon commented 1 year ago

Just to be clear - the beta file contains values for each row i: #Methylated reads with CpG i methylated, # reads covering CpG i

The calculation of "#Methylated reads / total reads for each CpG" you can do yourself by opening the beta file and taking the first value of the row and dividing by the second.

AzlanNI commented 1 year ago

Oh alright. I thought it was like the beta files from arrays. Where u have a beta value for each CpG in a matrix form.

But if the Beta File contains the methylation count and read count then i can just calculate it myself.

Thanks a lot!