shorvath / MammalianMethylationConsortium

DNA methylation studies of mammalian species
MIT License
32 stars 11 forks source link

Technical doubts about the use of Minfi and SeSAMe #3

Open EnriqueRT opened 9 months ago

EnriqueRT commented 9 months ago

Hello, first of all I would like to congratulate you for the wonderful research you are doing with epigenetic clocks!

I am starting my research in the field of aging and I am going to work with human methylation data obtained with Illumina Infinium MethylationEPIC BeadChip. I have noticed that Minfi and SeSAMe are some of the most popular bioinformatics tools for analysing this kind of data, which are usually in IDAT format. That is why, before starting to process the data, I wanted to do a small test to reproduce some of the analyses you carried out in your publication "Universal DNA methylation age across mammalian tissues". However, I had some technical doubts about the pre-processing and normalisation of the data:

a) First of all, I wanted to ask on which criteria you have based your choice of the SeSAMe package instead of Minfi for the normalisation of the data. I have been observing that both packages implement the same normal-exponential convolution on out-of-band probes (NOOB) correction method for background subtraction, and the correction of the dye-bias. However, when I use both tools on the test sample of your article (GSM6979529_202897220093_R01C01), the beta values obtained differ slightly, especially in the 0.3-0.5 value range, being lower the values obtained in Minfi. I have thought that a possible reason for these differences in this range may be because Minfi applies some criteria to avoid false positives and ensure that if methylation is detected in a CpG site it is because it is indeed methylated, unlike SeSAMe which seems to be more lax in this regard. These are all assumptions and I guess you will have worked more comprehensively to be able to discriminate between these tools.

b) In addition to this, I have tried to get the same normalised beta values as you have in your matrix "GSE223748_datBetaNormalized.csv.gz" hosted in GEO. To do this I followed the normalisation method of the SeSAMe package, as detailed in your publication, using either the "openSesame" command or going step by step using the noob and dyeBiasNL functions. However, the beta values still differ, despite having used the same versions of the packages as you have on GitHub. In fact, there are some CpG positions that in the GEO matrix indicate that the site is almost completely unmethylated and the results I get with SeSAMe indicate that they are almost 100% methylated:

Betas_GEO sample GSM6979529 (CpG: cg12767263; value: 0.0111 / CpG: cg24012221; value: 0.0089) Betas_Sesame sample GSM6979529 (CpG: cg12767263; value: 0.9854 / CpG: cg24012221; value:0.9826)

Finally, I would be grateful if you could clear up these doubts and perhaps clarify the steps you took to obtain the normalised beta values in GEO. I am a little concerned about trying to reproduce these results with different tools and getting such different values.

Thank you very much in advance for any help that could be provided!

Regards, Enrique.

ahaghani commented 8 months ago

Hi Enrique Here is a brief answer to this question

1- We used Sesame because of the out of bag pvalue calculation which could help to identify the probes that map to different mammalian species.

2- It does not matter which Sesame package you use to reproduce our results as long as you use the manual manifest. Here is a simple guide using the latest Sesame package (v 1.18.4)

HorvathMammal40.CanonicalManifest.3.2019.sesame.csv manifest_sesame <- read.csv("HorvathMammal40.CanonicalManifest.3.2019.sesame.csv”)

read sample_sheet, Basename is the path to idat files folder

ssets <- sample_sheet %>% select(Basename) %>% as_vector %>% unname

ssets <- readIDATpair(prefix.path = ssets, platform = 'custom', manifest = manifest_sesame )

betas <- ssets[[1]] %>% pOOBAH%>% noob %>% dyeBiasCorrTypeINorm%>% getBetas(mask=FALSE)%>% enframe(name="CGid", value="sesame_bval")

sesame_p_vals <- ssets[[1]] %>% pOOBAH (return.pval = T)%>% enframe(name="CGid", value="sesame_pval”)

EnriqueRT commented 8 months ago

Thank you very much for your answer and sharing part of the code Amin!

thanks to your answer I have been able to reproduce the results without any problem, however, in the official SeSAMe documentation it specifies that the workflow to follow to normalise this type of data must be first to apply the dye-bias correction and then the POOBAH and the noob correction. However, in the code you provided me the dye-bias correction is applied at the end, is this due to some special reason?

Thanks in advance Best, Enrique.

ahaghani commented 8 months ago

Hi Enrique I am glad to hear that, about the Sesame documentation, even if we use the wrapper function from Sesame, it will generate identical data as my code. I am not using the wrapper to be able to turn off masking.

I think Wanding Zhou, the author of Sesame, will be a better person to answer why the order is different in the documentation . Probably it will not affect the output but I am not sure.

Best Amin

On Mar 11, 2024, at 1:40 PM, Enrique Roig Tormo @.***> wrote:

Thank you very much for your answer and sharing part of the code Amin!

thanks to your answer I have been able to reproduce the results without any problem, however, in the official SeSAMe documentation https://www.bioconductor.org/packages/devel/bioc/vignettes/sesame/inst/doc/sesame.html it specifies that the workflow to follow to normalise this type of data must be first to apply the dye-bias correction and then the POOBAH and the noob correction. However, in the code you provided me the dye-bias correction is applied at the end, is this due to some special reason?

Thanks in advance Best, Enrique.

— Reply to this email directly, view it on GitHub https://github.com/shorvath/MammalianMethylationConsortium/issues/3#issuecomment-1989403724, or unsubscribe https://github.com/notifications/unsubscribe-auth/ASQN3EK7LQTTXPQMKF5OHPTYXYJDDAVCNFSM6AAAAABD2DOIZ6VHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTSOBZGQYDGNZSGQ. You are receiving this because you commented.

EnriqueRT commented 8 months ago

Hello Amin,

I have done some testing and it seems that the beta values differ slightly when using the dye-bias correction before pOOBAH and noob, although the difference is minimal, at most in the order of 0.05 units. I will take into account your answer and I will ask the author of sesame why the order. Even so, from your answer it is not very clear to me why you performed the dye-bias correction at the end, did you use a particular criteria?

Thank you Best, Enrique.

ahaghani commented 8 months ago

Hi EnriqueThanks for the information, as I mentioned, we cared about Our of Bag Pvalue to refine the CpGs that map to each species. That is why we used Sesame.BestAminOn Mar 20, 2024, at 5:32 AM, Enrique Roig Tormo @.***> wrote: Hello Amin, I have done some testing and it seems that the beta values differ slightly when using the dye-bias correction before pOOBAH and noob, although the difference is minimal, at most in the order of 0.05 units. I will take into account your answer and I will ask the author of sesame why the order. Even so, from your answer it is not very clear to me why you performed the dye-bias correction at the end, did you use a particular criteria? Thank you Best, Enrique.

—Reply to this email directly, view it on GitHub, or unsubscribe.You are receiving this because you commented.Message ID: @.***>