uclhal / Data-Science-for-Docs

Data Science for Doctors - a practical introduction
http://datascibc.org/Data-Science-for-Docs/
Other
9 stars 6 forks source link

Add ideas from existing courses #18

Closed docsteveharris closed 8 years ago

docsteveharris commented 8 years ago

Please add your notes at https://github.com/datascibc/course/wiki/stolen-ideas

tompollard commented 8 years ago

We have been putting together a repository for running workshops around a subset of the MIMIC-III data at: https://github.com/MIT-LCP/mimic-workshop. The repository is a bit of a mess at the moment because it contains content for a couple of different workshops (on different topics, run in different languages), but we'll be cleaning and improving it over time.

An example of an (unfinished!) IPython notebook that we have used is at: https://github.com/MIT-LCP/mimic-workshop/blob/master/01-example-patient-heart-failure.ipynb. The workshop takes participants through the process of:

tompollard commented 8 years ago

Harvard university ran some well-organised workshops on R and Python this week. The materials are online at: http://tutorials.iq.harvard.edu/.

docsteveharris commented 8 years ago

Awesome link. Thanks

On 17 Jan 2016, at 19:17, Tom Pollard wrote:

Harvard university ran some well-organised workshops on R and Python this week. The materials are online at: http://tutorials.iq.harvard.edu/.


Reply to this email directly or view it on GitHub: https://github.com/datascibc/course/issues/18#issuecomment-172368566

finncatling commented 8 years ago

This is a pretty nice, free intro to the basics of R that I enjoyed working through. The R console runs in the browser along with the tutorial so the barrier to entry is low. It might be useful as preparatory work for the workshop?

docsteveharris commented 8 years ago

Thanks @finncatling and @tompollard - others please add. Doesn't need to be a specific course - but just a theme or some material or a particular approach

mn936148 commented 8 years ago

Not sure if this is on list, but this is interactive set of R tutorials that Steve introduced me to:

http://swirlstats.com/

docsteveharris commented 8 years ago

Good tip - how about this … https://github.com/swirldev/swirlify/wiki where we write our own swirl module?

Just an idea


e: doc@steveharris.me (primary) e: s.harris8@nhs.net (secure) m: 07977 583315

On 21 January 2016 at 22:43:23, Myura Nagendran (notifications@github.com) wrote:

Not sure if this is on list, but this is interactive set of R tutorials that Steve introduced me to:

http://swirlstats.com/

— Reply to this email directly or view it on GitHub.

ahmed-alhindawi commented 8 years ago

I've subscribed to an edx.org called Statistical thinking of Data Science and Analytics....I've highlighted the ones that I found particularly enjoyable and discussed why below...

It's a 5 week courseware based project, their weeks are:

Week 1

*What is Data Science? What questions can Data Science answer? Why is there an explosion of data? What role does data visualization play in Data Science? How did you become interested in Data Science? What do you predict will happen in Data Science in 5 years? What are the most important skills for a Data Scientist? *What should a non-Data Scientist know about Data Science?

Week 2

*Statistical Thinking for Data Science *Numerical Data 1 Simple Visualization and Summaries* Numerical Data 2 Simple Visualization and Summaries* Numerical Data 3 Association Data Collection - Sampling Introduction to Probability Statistical Inference - Confidence Intervals Statistical Inference - Significance tests Status of Current Observational Health Studies Statistical Terms Explained Unknown Characteristics of Observational Health Studies Lessons Learnt from OMOP Experiments P-value Callibration Concluding Remarks

Week 3

*Conditional Probability Bayes' Formula Studying Association: Two-way Table Studying Association: Chi-square Test of Independence Studying Association: One-way Analysis of Variance Regression Analysis 1 and 2 Regression Analysis 3 and 4 Regression Analysis 4 and Concluding Remarks Types of Data Analytics Clustering Text Topic Modeling Metrics for Label Description Concluding Remarks

Week 4

*Graphs Are Comparisons* Use Data To Answer Questions A Case Example Decision Making Process of Data Visualization 1 Decision Making Process of Data Visualization 2 Decision Making Process Main Worked Example Why Visualize Data Worked Example 1 Why Visualize Data Worked Example 2**

Week 5

Introduction Probability Calibration Probability As Measurement of Uncertainty Bayesian Inference How To Use Prior Information Bayesian Modeling in Practice Business Applications in Bayesian Statistics Introduction Data Collection and Model Building 1 Data Collection and Model Building 2 Model Building Review Model Insights 1 Model Insights 2 Example Modeling Museum Membership Renewal Example Modeling User Behavior on a Deals Website

Conclusions

I understand that the above was a copy/paste job, but the ones I really enjoyed were a little bit of the background as to why this is important. The visualisation stuff was really excellent but the more important bit of it was the "decision making process of data visualisation" - it explained how different graphs convey different information and the data type that you have kind of infers the graph type. That was really nice.

The bayesian probability stuff was also excellent, but I don't think we'll have enough time in one day to go through that.

In conclusion, I think the best bits of this course are "Why Data Science" "Why is visualisation important" "How to pick the visual that you need" are the rough topics that I enjoyed.

Hope that helps,

Ahmed

dannyjnwong commented 8 years ago

This is an awesome set of videos for R: http://curleylab.psych.columbia.edu/rvideos.html

This is his course outline:

  1. Introduction to Data Analysis and Visualization This is the video that accompanies the lecture I gave in the Sports Management Masters program at Columbia in how to use R to analyze large sports datasets. The example is data from the PGA tour. The aim of this class/video was to show how to take a dataset of players and to perform some exploratory anlayses and visualizations of what may be the most important variables to look at and how to start evaluating individual differences in these variables. The aim of this video is to give some ideas as to how start performing these analyses. Up to around 50 minutes is stuff mainly for beginners and then after that I do a few more in depth concepts.

What's covered? 00.00.00 Getting Data into R 00.09.30 Simple Histograms 00.10.45 Intro to dplyr - install, select & filter 00.17.00 Simple boxplots 00.19.00 dplyr II - getting Summary data - n(), group_by, summarize & summarize_each 00.29.15 Simple scatterplots & ggplot intro 00.44.45 Correlation tests & intro to linear and multiple regression 00.55.45 purrr - map 01.02.00 Factor Analysis 01.20.00 dplyr III - mutate, arrange, joins 01.29.15 Non-metric multidimensional scaling