hackseq / hackseq_projects_2017

6 stars 1 forks source link

Project 4: Develop an interactive application to help understand alpha and beta diversity metrics choices #4

Open abaghela opened 6 years ago

abaghela commented 6 years ago

Develop an interactive application to facilitate informed sequencing quality control decisions for downstream analysis on many samples

There's the saying of "garbage in, garbage out" in computer science where the quality of your input influences downstream analyses. Genome sequencing has decreased in cost and so experiments can have many more samples. Manually checking each sample can be time consuming, and less precise. So I propose the development of web application or tool where you can drop in your samples and interactively explore the quality of your samples. This tool could be built by various means. One option would be to develop a Shiny R application, which would require knowledge of R, the Shiny package, and possibly HTML/CSS/JavaScript. Another would be to rely on web development standards (HTML/CSS/JS) to build something like an Electron application for cross browser compatibility and be user friendly. This idea stems from my experience dealing with 16S rRNA sequencing samples. I had a single experiment collect about 200 samples, with a total of about 400 samples for paired end sequencing. Manually viewing all 400 samples is time consuming. Additionally, further analysis of sequencing reads typically require some trimming based on the quality diminishing with longer reads. This tool could also be designed to recommend an ideal trim length based on your specifications of a hard threshold trimming all samples this length, or a dynamic threshold per sample basis. This trimming parameter will depend on the downstream tools used if they can handle such varying read lengths.

Team Lead: Eric Leung | leunge@ohsu.edu | @erictleung | Grad Student | Oregon Health & Science University, USA |

erictleung commented 6 years ago

Sooo I may have found someone's solution to my proposed project called MultiQC (GitHub link). It was published just over a year ago and is even more robust and has more functionality than just for my 16S rRNA use case. A quick Biostars/Google search could have saved me time :sweat_smile:

@abaghela if you allow me, I have another proposition for a project I could lead that is specific to microbiome analysis. Let me know if you have any concerns with this new proposed project or not. Thanks.


Title: Develop an interactive application to help understand alpha and beta diversity metrics choices

Problem: There are many alpha and beta diversity metrics to analyze microbial ecological or microbiome data. Alpha diversity describes an estimate of the total number of species in a sample. Beta diversity describes the differences between samples. Below are some example of then number of metrics you can use.

Plot from "Alpha diversity graphics" page for phyloseq showing various alpha diversity metrics to choose from http://joey711.github.io/phyloseq/plot_richness-examples

Below is are just a few beta diversity metrics choose from

> library(phyloseq)
> unlist(distanceMethodList)
    UniFrac1     UniFrac2        DPCoA          JSD     vegdist1     vegdist2
   "unifrac"   "wunifrac"      "dpcoa"        "jsd"  "manhattan"  "euclidean"
    vegdist3     vegdist4     vegdist5     vegdist6     vegdist7     vegdist8
  "canberra"       "bray" "kulczynski"    "jaccard"      "gower"   "altGower"
    vegdist9    vegdist10    vegdist11    vegdist12    vegdist13    vegdist14
  "morisita"       "horn"  "mountford"       "raup"   "binomial"       "chao"
   vegdist15   betadiver1   betadiver2   betadiver3   betadiver4   betadiver5
       "cao"          "w"         "-1"          "c"         "wb"          "r"
  betadiver6   betadiver7   betadiver8   betadiver9  betadiver10  betadiver11
         "I"          "e"          "t"         "me"          "j"        "sor"
 betadiver12  betadiver13  betadiver14  betadiver15  betadiver16  betadiver17
         "m"         "-2"         "co"         "cc"          "g"         "-3"
 betadiver18  betadiver19  betadiver20  betadiver21  betadiver22  betadiver23
         "l"         "19"         "hk"        "rlb"        "sim"         "gl"
 betadiver24        dist1        dist2        dist3   designdist
         "z"    "maximum"     "binary"  "minkowski"        "ANY"

With so many metrics to choose from, how do you know which is the "best" and how will your data affect the calculation of these metrics?

Proposed Project: Create an interactive Shiny application to show changes in your chosen alpha or beta diversity metrics to see how each change based on simulated or real data. Some of these metrics are sensitive to single or double counts of species so this will be good to see how different distributions of counts will change these metrics and your interpretations of them. This should be designed to give an intuitive understanding of how these metrics work.

Possible Requirements:

abaghela commented 6 years ago

@erictleung Hi Eric, we approve your change in project. We are looking forward to this new one!

ampatzia commented 6 years ago

Assignments are out, really looking forward to collaborating in this 👍 @erictleung Need in help with preparation?

erictleung commented 6 years ago

@ampatzia thanks for your interest! I've created a bare repository for put this project. I plan on getting a base Shiny application up for people to get up and running later this week, along with some ideas of what could be in the application itself. If I come up with anything else, I'll let you know! 😄

erictleung commented 6 years ago

Some good articles to use while working on this project will be http://shiny.rstudio.com/articles/. It has lots of content on getting started, building the structure, frontend and backend sides of the application, and improving it.

jakelever commented 6 years ago

Hey team lead, we've been gathering Github IDs for your team members. We see that you've already started a repo for this project. So could you please add the following people as collaborators to that project?

aimirza amanji rnoronha00 ampatzia vnsriniv

Once the people are added, it'd be a great idea to start a discussion on that repo with information to get your team members started (e.g. some small suggested reading, things to look up, etc). We will also be adding everyone to Slack and creating a specific channel for each project. This may be an easier way to communicate.

We'll forward on any remaining Github IDs through this issue.

Thanks, Jake obo the Hackseq organising committee

erictleung commented 6 years ago

@jakelever thanks!

jakelever commented 6 years ago

Hi, one more Github ID for you:

cabrerad

Thanks, Jake

jakelever commented 6 years ago

And one last one: scatcher125

Cheers, Jake

erictleung commented 6 years ago

@jakelever both added! Thanks for the update.

jakelever commented 6 years ago

And actually one more Github ID: szhan