samschaf / team_Methylhomies

1 stars 2 forks source link

Feedback on initial group info #1

Open santina opened 7 years ago

santina commented 7 years ago

Initial group info you provided:

Name Department/Program Expertise/Interests GitHub ID
Samantha Schaffner Medical Genetics Neuroepigenetics of human development and Parkinson's Disease @samschaf
Cassia Warren Interdisciplinary Oncology Biomarker development for treatments of Pancreatic cancer @cwarren5124
Hilary Brewis Medical Genetics Histone variants and chromatin structure @hbrewis
Randip Gill Educational Psychology Social epigenetics @rg7486
Lisa Wei Bioinformatics Cancer genomics, sequence analysis @suminwei2772

Team name: Methylhomies

One paragraph on the basic idea of the project:

DNA methylation (DNAm), or covalent attachment of a methyl group to the 5’ position of cytosine bases located in CpG dinucleotides, plays a crucial role in maintaining patterns of gene expression during human development and aging. Along with other epigenetic modifications, DNAm is sensitive to environmental influences and can change over an individual’s lifespan. Thus, understanding the human methylome is important for determining both biomarkers for - and direct pathways implicating - health and disease [1]. Aberrant DNAm patterns have been correlated with common neurodegenerative disorders, including Alzheimer’s Disease and Parkinson’s Disease [2], as well as with mental disorders such as schizophrenia [3,4]. However, much of the literature on either the diseased or healthy brain methylome fails to separate DNAm data by cell type composition - a major driver of DNAm variability - or by brain region [5, 6]. We aim to provide a baseline of variation in the normal human methylome across various brain regions and according to proportions of neurons versus glia. We will use a publicly available dataset of n=122 healthy individuals aged 40-105 [7], consisting of reads obtained using the Illumina HumanMethylation450 BeadChip array platform. We will process the reads and quantify methylation levels, determine the proportions of neurons in the prefrontal cortex, entorhinal cortex, superior temporal gyrus and the cerebellum with a cell type prediction algorithm, and correlate the neuronal proportion with the amount of methylation in these regions [8]. This analysis will inform on (1) the difference in methylation levels between cortex and cerebellum and within cortical subregions and (2) the relationship between neuronal proportion and methylation levels in all regions analyzed. Our findings will be validated using an independent dataset: the Allen Brain Atlas’ Human Developing Brain cohort of n=16 healthy individuals aged 4 months - 37 years (http://brain-map.org/). Ultimately, the project will shed light on the importance of variability in methylation between brain regions, providing a guideline to analyze existing brain epigenetic literature and to plan robust experimental design moving forward.

References:

[1] Bernstein, B. E. et al. (2007). The Mammalian Epigenome. Cell. 128(4): 669-681.

[2] Sanchez-Mut, J. V., et al. (2016). Human DNA methylomes of neurodegenerative diseases show common epigenomic patterns. Transl Pysch. 6: e817.

[3] Huang, H. S., & Akbarian, S. (2007). GAD1 mRNA expression and DNA methylation in prefrontal cortex of subjects with schizophrenia. PloS one, 2(8), e809.

[4] Huang, H. S., et al. (2007). Prefrontal dysfunction in schizophrenia involves mixed-lineage leukemia 1-regulated histone methylation at GABAergic gene promoters. Journal of Neuroscience, 27(42), 11254-11262.

[5] Shin, J., et al. (2014). Decoding neural transcriptomes and epigenomes via high-throughput sequencing. Nat Neurosci 17(11): 1463-1475.

[6] Jaffe, A. E. & Irizarry, R. A. (2014). Accounting for cellular heterogeneity is critical in epigenome-wide association studies. Genome Biology. 15: R31.

[7] Hannon, E. et al. (2015). Interindividual methylomic variation across blood, cortex, and cerebellum: implications for epigenetic studies of neurological and neuropsychiatric phenotypes. Epigenetics. 10(11): 1024-1032.

[8] Morris, T. J., & Beck, S. (2015). Analysis pipelines and packages for Infinium HumanMethylation450 BeadChip (450k) data. Methods. 72: 3-8.

santina commented 7 years ago

Hi @STAT540-UBC/team-methylhomies,

@ppavlidis and I will be your contacts for the project. You can ask us questions by opening issues and tagging us.

Here are some thoughts on your initial project info:

DNAm is sensitive to environmental influences and can change over an individual’s lifespan. ..... We will use a publicly available dataset of n=122 healthy individuals aged 40-105 [7] ... Our findings will be validated using an independent dataset: the Allen Brain Atlas’ Human Developing Brain cohort of n=16 healthy individuals aged 4 months - 37 years ....

You mention that epigenetic modifications change over one's lifetime, but you plan to develop a model using a dataset from one age group and validate with data from another age group. Is there a reason why you cannot divide the first set of data into two set, one for training and one for testing (assuming that's what you meant by validation)? That way at least you ensure that you control for the variability among different age groups.

We will process the reads and quantify methylation levels, determine the proportions of neurons in the prefrontal cortex, entorhinal cortex, superior temporal gyrus and the cerebellum with a cell type prediction algorithm.

What is the cell type prediction algorithm you are using and why are you interested in the cell type composition beyond just the differential methylation in different brain regions? Are you interested in knowing the differences in DNAm in different cell types in different brain regions?

Your sixth reference (Jaffe, 2014) is on DNAm profile of blood, not brain. It'd be nice to look for any literature that discusses variation in DNAm in different brain regions and why it's important to address it. That way you can talk more about the knowledge gap you're trying to address.

Be sure to include more details on the statistical methods and tools you will be using and the division of labour (e.g. literature research, preprocessing data, cleaning data, data QC, exploratory data analysis, statistical analyses, writing etc).

ppavlidis commented 7 years ago

@santina, I don't think they mean "validation" the way you might be thinking. If I can read between the lines, they want to see if they see the same differences in the Brainspan methylation data. Yes, age differences between the data sets might be something to consider. If it's "see if the differences we find in adult are present during early development" that's not validation, it's an experiment.

Team, about "the difference in methylation levels between cortex and cerebellum and within cortical subregions". I see you are aware of the issue here: that the main differences will be due to cell type composition differences - especially between cerebellum and cortex! Ultimately correcting for just "neurons versus glia" won't be enough because the regions have quite different types of neurons - but maybe okay for course project purposes. My lab has done a lot of this kind of work (for RNA) so if you want more input on cell type issues let me know. For example we have developed sets of marker transcripts that can resolve about 30 brain cell types.

I'd like you to discuss more about how you will relate your study to the Hannon et al. results. Again, reading between the lines, I believe you are implying that some effects Hannon et al. see might be due to cellular makeup variation among individuals, not really interesting gene regulation signals? I agree, though whether this really affects the bottom line of that paper isn't so clear. What do you think?

Minor: "reads obtained using the Illumina HumanMethylation450 BeadChip array platform. We will process the reads". BeadChip doesn't use "reads" at least that's not a term I've seen used for that.

santina commented 7 years ago

@STAT540-UBC/team-methylhomies I didn't get to talk with you last week, and won't be there today. Make sure you go to the seminars today and ask questions so you can finish your final proposal by next week. I have asked Farnush to check in with you guys. Bring questions and respond to this thread to address the comments from Paul and I made and what you got out of the seminar session today.

cwarren5124 commented 7 years ago

@santina @ppavlidis To address your questions:

  1. Thank you for pointing our about the age difference between the 2 groups. We will not be exploring methylation differences between age groups as this is a research question for a PhD student in our lab. Our goal is not to develop a model, we are trying to use an established model of cell type correction to assess methylation differences within brain regions. We do not feel that we will need a separate validation step and therefore do not plan on splitting our data or using the Allen the dataset as validation. However if we do need to split the original data set, when would we need to do this in our outlined analysis steps below?
  2. Our group has chosen to correct for cell type and then just compare DNAm levels between different brain regions (as opposed to look at DNAm differences with respect to cell type differences).
  3. Paul, thank you for pointing out the variation between types of neurons. It would be ideal if we could account for that, however we have not found an established package in R that can do so specifically for methylation, which is why we decided CETS would be our best option here as it is widely used in the field. If we have overlooked something you are aware of, let us know.
  4. In regards to the Hannon et al paper, we have tried to address your thoughts with our analysis questions (which we have briefly summarised below).

Here is a brief outline of what we plan to do, with questions posted below:

Project Steps: Data steps:

Analysis:

Github, writing and poster:

Our specific questions for you: When should we do PCA - before or after ComBat? Our data file is too large to upload to github how do you suggest we share it? If we need to process long/heavy computations on large data, can we use external servers? Are laptops generally sufficient for this? Should we seek alternative options, e.g. work or UBC computers with higher processing power?

ppavlidis commented 7 years ago

This is great, thanks. I'll try to answer your specific questions. I may have more comments about your research plan but at first glance it seems sound, keep working on it!

When should we do PCA - before or after ComBat?

Possibly both. PCA as applied here is basically an exploratory method. Beforehand, you can use it to see if batch effects are prominent. Afterwards, you can use it to explore the "cleaned" data in the absence of batch effects.

I don't recommend using PCA as a way to actually remove factors that you know about like 'age'. Instead of removing components you'd probably treat age as a covariate in your linear model (if anything).

Our data file is too large to upload to github how do you suggest we share it?

There's no single solution. You could swap it around by flash key; upload it to a file sharing site like dropbox (possible security/privacy implications); or some departments have FTP sites they can use. You might have to ask.

If we need to process long/heavy computations on large data, can we use external servers?

Of course.

Are laptops generally sufficient for this?

Depends. The main problem is the large number of probes; the issue will be RAM.

I can look into how we could provide more resources if you run into difficulties.

suminwei2772 commented 7 years ago

@ppavlidis When you say including "age" as a covariate in our linear model - are you referring to the linear model when we do analysis to find differentially methylated regions?

ppavlidis commented 7 years ago

Yes

santina commented 7 years ago

n. However if we do need to split the original data set, when would we need to do this in our outlined analysis steps below?

It doesn't look like you need to split it. Just use all the data!

Our data file is too large to upload to GitHub how do you suggest we share it?

Since you're using publicly available dataset, the one thing you should put on GitHub is the download script for the data, or least describe where and how you get the data. Ideally, members of your group or other people looking at your repo can just follow the instruction and get a copy of that themselves.

Sounds like you're reproducing Hannon et al. study (with the same data set they're using) except you're adding a cell type correction into the pipeline and going to compare your result with their study and on top of that a functional enrichment analysis, so that you can say whether cell type correction is important in this type of analysis. Is that right?

Honestly I'm still confused how the cell type correction works and why it's important, haha. Maybe you can explain to me tomorrow :)

samschaf commented 7 years ago

@santina Thanks for the tips! We have already shared the download script on Github and are working through it.

Yes, investigating whether cell type correction is important will be one of our major aims, in addition to investigating the differences between brain regions. Each cell type has a characteristic DNA methylation profile; this is part of what regulates gene expression and makes differentiated cells phenotypically distinct from each other. So if all the cells in the brain are lumped together for analysis, we can't be sure that methylation differences are due to the type of cell or due to a factor we are more interested in, such as environmental stressors, or in this case, brain region. The CETS package we plan to use for cell type correction includes reference methylation profiles for neurons and glia based on FACS-sorted healthy human cell populations. It uses these reference profiles to predict the proportions of neurons and glia in our data based on their methylation scores, then transforms the methylation data to remove the variation which was due to cell type alone. So it's essentially another method of normalizing the data, allowing us to directly compare methylation between neurons and glia. Hope this answers your question!