SISG 2022: Module 10, MCMC for Genetics
TS Eliot: "We shall not cease from exploration And the end of all our exploring Will be to arrive where we started And know the place for the first time."
Key information:
Instructors: Eric C. Anderson and Matthew Stephens.
TAs: Sue Parkinson and Karl Tayeb
Zoom meeting link:
https://uchicago.zoom.us/j/96210188590?pwd=VTNPME9LaE1SWGZmOTlickkxQUFCZz09
Additional details
Matthew Stephens is inviting you to a scheduled Zoom meeting.
Meeting ID: 962 1018 8590
Passcode: 089309
One tap mobile
+13126266799,,96210188590#,,,,089309# US (Chicago)
+13462487799,,96210188590#,,,,089309# US (Houston)
Dial by your location
+1 312 626 6799 US (Chicago)
+1 346 248 7799 US (Houston)
+1 646 558 8656 US (New York)
+1 646 931 3860 US
+1 669 444 9171 US
+1 669 900 9128 US (San Jose)
+1 253 215 8782 US (Tacoma)
+1 301 715 8592 US (Washington DC)
Meeting ID: 962 1018 8590
Passcode: 089309
Find your local number: https://uchicago.zoom.us/u/actUqRQMtG
Join by Skype for Business
https://uchicago.zoom.us/skype/96210188590
Slack: you should have access to the Slack channel mod10_mcmc_genetics_2022
Session Times (Seattle time, PST)
Monday 8am-2.30pm
Tuesday 8am-2:30pm
Wednesday 8-11:00am
Material will be delivered via zoom by live lectures and live practical sessions, with additional reading materials and/or slides also provided. Each session builds on previous sessions so you will get maximum benefit by attending every session live and in sequence.
prep indicates material for instructors reference; you may ignore it
Zoom guidelines
The zoom link is https://uchicago.zoom.us/j/96210188590?pwd=VTNPME9LaE1SWGZmOTlickkxQUFCZz09 with further dial-in details given above under "key information"
We will record each session, and make available to participants as soon as practical. The recordings should be available for 90 days.
Please have your camera on where possible - it helps give a closer approximation to an "in person" experience. Especially try to have your camera on in break-out sessions.
Please mute yourself during lectures (unless you need to speak) but please unmute yourself during break-out sessions.
To get help during breakout sessions you may want to share your screen. You can only do that if you sign into zoom on your computer (not a phone or other mobile device).
Pre-module Preparation:
Please make sure you have working versions of R, Rstudio and the latest version of zoom installed on your computer.
https://www.r-project.org/https://rstudio.com/products/rstudio/download/https://zoom.us/
Please be sure to install some necessary R packages with
install.packages(c("tidyverse", "plotly", "workflowr", "expm", "viridis"))
Copy Install the binary versions. Please do not install later versions from source code that require compilation.
Please download the materials from fiveMinuteStats
https://github.com/stephens999/fiveMinuteStats
if you know how to use git, then do it that way. Otherwise the easiest way is to click on the green "Code" button and download the zip file.
once you have downloaded the files, open up the file r_simplemix.Rmd in the analysis/ subdirectory and try to knit it using the Rstudio "knit" button.
In a similar manner to downloading the materials from fiveMinuteStats, also download the materials from sisg-mcmc-exercises-eca
https://github.com/eriqande/sisg-mcmc-exercises-eca
Day 1 (Times are approximate)
8:00 am Introductions (15 mins)
Instructors and TAs introduce themselves
Overview of course and materials
CHECK: Have you completed the preliminary preparation?
8:15am Session 0, Lecture: genetic mixture and breaking the ice! @ms
Complete Exercise 1 in https://stephens999.github.io/fiveMinuteStats/r_simplemix.html
Compare/discuss/troubleshoot the answer to the Exercise in your break-out rooms
Since this is the first time you are using break-out rooms:
introduce yourselves! Give your name, academic background, research interests, and a hobby. Go by alphabetical order of family name. There will be approximately four students per breakout room. From now on we will call the first student A, the second B, then C and D etc.
First student (A) should take the lead in this session. In later sessions B, C and D will take it in turns to take the lead.
eg Student A can share screen as you work through the exercises together.... of course if it helps to switch to have another student share screen then go ahead...
Other students: make suggestions; ask questions... Try to help one another out!
If you would like help from a TA/instructor you should be able to ask for help from Zoom. (Alternatively use the slack channel, and tell us which breakout room would like assistance.) We will be there as soon as we can!
9:00am Session 1: Bayesian inference - the assignment problem @ms
Use the ideas from this session to complete Exercise 2 in https://stephens999.github.io/fiveMinuteStats/r_simplemix.html
note the answer template in that file
Breakout rooms: student B in each room lead this session.
10 am Session 2: Bayesian inference - Estimating allele frequencies/binomial (50 mins)
When doing all of the exercises, always ask yourself these three questions: 1) what is the random variable being simulated? 2) what is the function g(x)g(x) that is being evaluated? and 3) what is the expectation that I am approximating?
Sampling from a beta posterior distribution
https://eriqande.github.io/sisg-mcmc-exercises-eca/monte-carlo-sampling-from-a-beta-posterior.nb.html (You can download the Rmd from the "Code" button in the upper right of this notebook, or work from the Rmd in the sisg-mcmc-exercises-eca repository)
BONUS READING/EXERCISES: Monte Carlo integration of a deterministic function. (You are not expected to get to it during class time, but it is there if you want to play with it in the evening)
https://eriqande.github.io/sisg-mcmc-exercises-eca/003-monte-carlo-to-evaluate-an-integral.nb.html (or the Rmd in the repo)
1.30pm Session 4: Markov Chains @ea (50 mins)
Find and run the code that produced the html above (analysis/MH-examples1.Rmd)
Run through the exercises under Examples 1 and 2 in that Rmd file
(Look at Example 3 if you finish 1+2)
10:30 Lunch/Self-study. 1.5 hours.
12 noon Session 7: Metropolis--Hastings in 2d @ea
The exercises and answer templates are here:
https://stephens999.github.io/fiveMinuteStats/r_simplemix_gibbs_2.html
Note: these exercises, especially working out the details of the update for m for the correlated allele frequencies model, could take some time, and implementing them all will take you beyond today I think...
9:30am Session 10: Importance sampling and Metropolis-Coupled MCMC @ea
Note that the code for the graphical simulations done in this session (and other sessions) is in: https://github.com/eriqande/sisg-mcmc-opengl-computer-demos
SISG 2022: Module 10, MCMC for Genetics TS Eliot: "We shall not cease from exploration And the end of all our exploring Will be to arrive where we started And know the place for the first time." Key information: Instructors: Eric C. Anderson and Matthew Stephens. TAs: Sue Parkinson and Karl Tayeb Zoom meeting link: https://uchicago.zoom.us/j/96210188590?pwd=VTNPME9LaE1SWGZmOTlickkxQUFCZz09 Additional details Matthew Stephens is inviting you to a scheduled Zoom meeting.
Topic: sisg 2022 Time: Jul 18, 2022 08:00 AM Pacific Time (US and Canada) Every day, 3 occurrence(s) Jul 18, 2022 08:00 AM Jul 19, 2022 08:00 AM Jul 20, 2022 08:00 AM Please download and import the following iCalendar (.ics) files to your calendar system. Daily: https://uchicago.zoom.us/meeting/tJIvdumppjMvE9QNtB5fGIg8dgIlJAGUwdCG/ics?icsToken=98tyKuCurDoqG9ydtRCHRowAAIj4c-vxiFxYj_pssgvHViZ0SwSuMuVrPpheN-3H
Join Zoom Meeting https://uchicago.zoom.us/j/96210188590?pwd=VTNPME9LaE1SWGZmOTlickkxQUFCZz09
Meeting ID: 962 1018 8590 Passcode: 089309 One tap mobile +13126266799,,96210188590#,,,,089309# US (Chicago) +13462487799,,96210188590#,,,,089309# US (Houston)
Dial by your location +1 312 626 6799 US (Chicago) +1 346 248 7799 US (Houston) +1 646 558 8656 US (New York) +1 646 931 3860 US +1 669 444 9171 US +1 669 900 9128 US (San Jose) +1 253 215 8782 US (Tacoma) +1 301 715 8592 US (Washington DC) Meeting ID: 962 1018 8590 Passcode: 089309 Find your local number: https://uchicago.zoom.us/u/actUqRQMtG
Join by SIP 96210188590@zoomcrc.com
Join by H.323 162.255.37.11 (US West) 162.255.36.11 (US East) 115.114.131.7 (India Mumbai) 115.114.115.7 (India Hyderabad) 213.19.144.110 (Amsterdam Netherlands) 213.244.140.110 (Germany) 103.122.166.55 (Australia Sydney) 103.122.167.55 (Australia Melbourne) 149.137.40.110 (Singapore) 64.211.144.160 (Brazil) 149.137.68.253 (Mexico) 69.174.57.160 (Canada Toronto) 65.39.152.160 (Canada Vancouver) 207.226.132.110 (Japan Tokyo) 149.137.24.110 (Japan Osaka) Meeting ID: 962 1018 8590 Passcode: 089309
Join by Skype for Business https://uchicago.zoom.us/skype/96210188590 Slack: you should have access to the Slack channel mod10_mcmc_genetics_2022 Session Times (Seattle time, PST) Monday 8am-2.30pm Tuesday 8am-2:30pm Wednesday 8-11:00am Material will be delivered via zoom by live lectures and live practical sessions, with additional reading materials and/or slides also provided. Each session builds on previous sessions so you will get maximum benefit by attending every session live and in sequence.
reading indicates vignette/reading/slides/materials
exercise indicates exercises
prep indicates material for instructors reference; you may ignore it
Zoom guidelines The zoom link is https://uchicago.zoom.us/j/96210188590?pwd=VTNPME9LaE1SWGZmOTlickkxQUFCZz09 with further dial-in details given above under "key information" We will record each session, and make available to participants as soon as practical. The recordings should be available for 90 days. Please have your camera on where possible - it helps give a closer approximation to an "in person" experience. Especially try to have your camera on in break-out sessions. Please mute yourself during lectures (unless you need to speak) but please unmute yourself during break-out sessions. To get help during breakout sessions you may want to share your screen. You can only do that if you sign into zoom on your computer (not a phone or other mobile device). Pre-module Preparation: Please make sure you have working versions of R, Rstudio and the latest version of zoom installed on your computer. https://www.r-project.org/ https://rstudio.com/products/rstudio/download/ https://zoom.us/ Please be sure to install some necessary R packages with install.packages(c("tidyverse", "plotly", "workflowr", "expm", "viridis")) Copy Install the binary versions. Please do not install later versions from source code that require compilation. Please download the materials from fiveMinuteStats https://github.com/stephens999/fiveMinuteStats if you know how to use git, then do it that way. Otherwise the easiest way is to click on the green "Code" button and download the zip file. once you have downloaded the files, open up the file r_simplemix.Rmd in the analysis/ subdirectory and try to knit it using the Rstudio "knit" button. In a similar manner to downloading the materials from fiveMinuteStats, also download the materials from sisg-mcmc-exercises-eca https://github.com/eriqande/sisg-mcmc-exercises-eca Day 1 (Times are approximate) 8:00 am Introductions (15 mins) Instructors and TAs introduce themselves Overview of course and materials CHECK: Have you completed the preliminary preparation? 8:15am Session 0, Lecture: genetic mixture and breaking the ice! @ms
reading
https://stephens999.github.io/fiveMinuteStats/r_simplemix.html
exercise
1a. Find and run ("knit") the Rmd file that created https://stephens999.github.io/fiveMinuteStats/r_simplemix.html HINT: the Rmd files are in the analysis subdirectory. 1b. Also run the file in the console (eg select "run all" from the Run menu)
reading
https://stephens999.github.io/fiveMinuteStats/likelihood_ratio_simple_models.html https://stephens999.github.io/fiveMinuteStats/LR_and_BF.html https://stephens999.github.io/fiveMinuteStats/bayes_multiclass.html
exercise
Use the ideas from this session to complete Exercise 2 in https://stephens999.github.io/fiveMinuteStats/r_simplemix.html note the answer template in that file Breakout rooms: student B in each room lead this session. 10 am Session 2: Bayesian inference - Estimating allele frequencies/binomial (50 mins)
reading
https://stephens999.github.io/fiveMinuteStats/likelihood_function.html https://stephens999.github.io/fiveMinuteStats/bayes_beta_binomial.html https://stephens999.github.io/fiveMinuteStats/beta.html https://stephens999.github.io/fiveMinuteStats/bayes_conjugate.html
exercise
Complete Exercise 3 in https://stephens999.github.io/fiveMinuteStats/r_simplemix.html . Breakout rooms: student C in each room lead this session. 11am (Lunch/self-study 1.5 hours) 12.30pm Session 3: Monte Carlo @ea (50 mins)
reading
Monte Carlo lecture slides in PDF:
https://eriqande.github.io/sisg_mcmc_course/2021-monte-carlo-lecture-slides.pdf
exercise
When doing all of the exercises, always ask yourself these three questions: 1) what is the random variable being simulated? 2) what is the function g(x)g(x) that is being evaluated? and 3) what is the expectation that I am approximating? Sampling from a beta posterior distribution https://eriqande.github.io/sisg-mcmc-exercises-eca/monte-carlo-sampling-from-a-beta-posterior.nb.html (You can download the Rmd from the "Code" button in the upper right of this notebook, or work from the Rmd in the sisg-mcmc-exercises-eca repository) BONUS READING/EXERCISES: Monte Carlo integration of a deterministic function. (You are not expected to get to it during class time, but it is there if you want to play with it in the evening) https://eriqande.github.io/sisg-mcmc-exercises-eca/003-monte-carlo-to-evaluate-an-integral.nb.html (or the Rmd in the repo) 1.30pm Session 4: Markov Chains @ea (50 mins)
reading
Markov Chains lecture slides in PDF: https://eriqande.github.io/sisg_mcmc_course/2021-markov-chains-lecture-slides.pdf
exercise
Playing with the bouncing blob. https://eriqande.github.io/sisg-mcmc-exercises-eca/markov-chain-bouncing-blob-exercise.nb.html (or the Rmd in the repo) BONUS READING/EXERCISES: Biasing a random walk. You might not get to this during the class period, but it is a useful preamble to Session 5 if you can find the time. https://eriqande.github.io/sisg-mcmc-exercises-eca/006-markov-chain-biased-random-walk.nb.html (or the Rmd in the repo) 2.30pm Formal period over. Instructors will be available to help troubleshoot issues arising during the day. Day 2 8am Session 5: Metropolis--Hastings - Intro @ms
reading https://stephens999.github.io/fiveMinuteStats/MH_intro.html
prep Eric's sampling from the beta-density via M-H slides/animation.
https://github.com/eriqande/sisg-mcmc-opengl-computer-demos overview instructions at https://www.youtube.com/watch?v=a8gjem86Uf4 run using sisg-mcmc-opengl-computer-demos stephens$ ./beta_sim open windows using keys 1 and 2... start/stop using spacebar 9am Session 6: Practical session (MH Simple Examples) @ms
reading https://stephens999.github.io/fiveMinuteStats/MH-examples1.html
exercise
Find and run the code that produced the html above (analysis/MH-examples1.Rmd) Run through the exercises under Examples 1 and 2 in that Rmd file (Look at Example 3 if you finish 1+2) 10:30 Lunch/Self-study. 1.5 hours. 12 noon Session 7: Metropolis--Hastings in 2d @ea
reading
MCMC in two dimensions lecture slides in PDF: https://eriqande.github.io/sisg_mcmc_course/2021-two-dimension-MCMC.pdf
exercise
reading
Gibbs sampling lecture slides in PDF: https://eriqande.github.io/sisg_mcmc_course/2021-Gibbs-sampling-inbreeding-model.pdf Additional readings from fiveMinuteStats about gibbs sampling and the simple genetic mixture model: https://stephens999.github.io/fiveMinuteStats/gibbs1.html https://stephens999.github.io/fiveMinuteStats/gibbs_structure_simple.html
exercise
We will use the ideas from this session to add to the r_simplemix.Rmd analysis and create a gibbs sampler The exercises and answer templates are here: https://stephens999.github.io/fiveMinuteStats/r_simplemix_gibbs_1.html Day 3 8am Session 9: Gibbs sampling for genetic mixture @ms In this session we discuss some possible extensions to the MCMC scheme from Session 8, as outlined here: https://stephens999.github.io/fiveMinuteStats/r_simplemix_gibbs_2.html
exercise
The exercises and answer templates are here: https://stephens999.github.io/fiveMinuteStats/r_simplemix_gibbs_2.html Note: these exercises, especially working out the details of the update for m for the correlated allele frequencies model, could take some time, and implementing them all will take you beyond today I think... 9:30am Session 10: Importance sampling and Metropolis-Coupled MCMC @ea Note that the code for the graphical simulations done in this session (and other sessions) is in: https://github.com/eriqande/sisg-mcmc-opengl-computer-demos
reading
Importance sampling and simulated tempering lecture slides in PDF: https://eriqande.github.io/sisg_mcmc_course/2021-imp-samp-mcmcmc.pdf
exercise
final discussions and course evaluations 11am: finish