grunwaldlab / Reproducible-science-in-R

This site provides information on conducting reproducible science using the R language.
http://grunwaldlab.github.io/Reproducible-science-in-R/
13 stars 8 forks source link

Example data set we can use? #5

Closed zkamvar closed 8 years ago

zkamvar commented 8 years ago

We can use the ezec package to analyze a fungicide resistance data set, creating an example workflow. I tried looking at the case studies from APS, but could not find a data set. One potential data set I found was in a PLoS paper on fruit and seed toxicity, which used fungi growth to measure seed toxicity.

grunwald commented 8 years ago

See stripe rust data for two years at http://www.apsnet.org/edcenter/advanced/topics/EcologyAndEpidemiologyInR/DiseaseProgress/Pages/StripeRust.aspx. Suggest graphing data over tie followed by calculation of AUDPC and ANOVA on AUDPC values.

grunwald commented 8 years ago

Download a tree dataset from treebase as another example. See: http://www.carlboettiger.info/2011/05/16/treebase-in-r-a-first-tutorial.html. However, use a plant pathogen group like Phytophthora.

zachary-foster commented 8 years ago

Here is how to parse the APS data into long format:

data <- read.csv("http://www.apsnet.org/edcenter/advanced/topics/EcologyAndEpidemiologyInR/DiseaseProgress/Documents/FungicideExample.csv")
long_data <- reshape2::melt(data, id.vars = "Julian.Date")
long_data$control <- grepl(pattern = "\\.[Tt]rt$", long_data$variable)
long_data$variable <- gsub(long_data$variable, pattern = "\\.[Tt]rt$", replacement = "")
library(ggplot2)
ggplot(long_data) + geom_line(aes(x = Julian.Date, y = value, color = control)) + facet_wrap(~variable)
zkamvar commented 8 years ago

For regex, we can link sites for people to practice:

https://regexper.com https://regex101.com

zkamvar commented 8 years ago

Here's how to get the AUDPC data using agricolae (and optionally dplyr):

The dplyr way:

library('dplyr')
library('agricolae')
audpc_data <- long_data %>% 
  group_by(variable, control) %>% 
  summarize(AUDPC = audpc(value, Julian.Date))
audpc_data
## Source: local data frame [6 x 3]
## Groups: variable [?]
## 
##                    variable control   AUDPC
##                       (chr)   (lgl)   (dbl)
## 1               CutterWheat   FALSE 106.760
## 2               CutterWheat    TRUE  55.370
## 3               JaggerWheat   FALSE 254.410
## 4               JaggerWheat    TRUE 134.070
## 5 TwentyOneThirtySevenWheat   FALSE 188.200
## 6 TwentyOneThirtySevenWheat    TRUE 132.005

# Plotting!
 ggplot(audpc_data, aes(x = variable, y = AUDPC, fill = control)) + geom_bar(stat = "identity", position = "dodge")

OR the base R way:

library('agricolae')
audpc_data <- audpc(t(data[-1]), data[[1]])
audpc_data
##    TwentyOneThirtySevenWheat TwentyOneThirtySevenWheat.trt
##                      188.200                       132.005
##                  CutterWheat               CutterWheat.Trt
##                      106.760                        55.370
##                  JaggerWheat               JaggerWheat.Trt
##                      254.410                       134.070
zkamvar commented 8 years ago

A different version of Zach's original with names:

library('dplyr')
data <- read.csv("http://www.apsnet.org/edcenter/advanced/topics/EcologyAndEpidemiologyInR/DiseaseProgress/Documents/FungicideExample.csv")
long_data <- data %>% 
  reshape2::melt(id.vars = "Julian.Date", value.name = "disease") %>%  # long form
  mutate(fungicide = grepl(pattern = "\\.[Tt]rt$", variable)) %>%      # add fungicide column
  transform(fungicide = ifelse(fungicide, "treated", "untreated")) %>% # make it sensible
  transform(variable = gsub("\\.[Tt]rt$", "", variable))               # clean up the variable

library('ggplot2')
g <- ggplot(long_data)
g <- g + geom_line(aes(x = Julian.Date, y = disease, color = fungicide))
g <- g + facet_wrap(~variable)
zkamvar commented 8 years ago

This is being addressed by #19 and #20