nloyfer / wgbs_tools

tools for working with Bisulfite Sequencing data while preserving reads intrinsic dependencies
Other
134 stars 37 forks source link

convert 450k data to beta or pat #28

Closed avilella closed 1 month ago

avilella commented 1 year ago

Hi, is there a way to convert 450k data to beta or pat format?

I am interested in comparing the data from the Nature atlas paper and the data from TCGA 450k array cancer samples for a specific region (MYC gene and upstream+downstream intergenic region).

Is there a way to convert the ".level3betas.txt" files from TCGA into a format that can then be analysed side-by-side as the .hg38.beta files in the Nature atlas paper?

E.g. example first few lines of a ".level3betas.txt" file from TCGA:

cg07549526      NA
cg16670573      NA
cg09969830      0.0165025989557348
cg00179196      0.959862895312011
cg03948744      0.0517703039642285
cg02729269      0.770865197097467
cg10009236      0.0196736740017586
cg10143220      0.963637601928311
cg05791870      0.981329901446056
cg01527023      0.0293196490201621
cg00928894      0.0563106576186536
cg02369618      0.0227020285011311
cg09580244      0.0798872678158596
cg02783232      0.0178798741323557
cg00389577      NA
cg08400316      NA
cg07893512      0.0536192270323858
cg05057452      0.0532595029760251
cg04141813      0.167112256692955
cg15597257      NA
cg00697413      NA

Thanks in advance.

yonniejon commented 1 month ago

There is no way to convert 450k data to pat format. You probably could convert it to beta format, but the best option is to use beta_to_450k command to convert your beta file to methylation levels of the CpGs used in 450k data. If this does not solve the issue, please update and re-open.