sneumann / xcms

This is the git repository matching the Bioconductor package xcms: LC/MS and GC/MS Data Analysis
Other
180 stars 80 forks source link

Identification of outlier samples in plotAdjustedRtime plot #551

Open gmhhope opened 3 years ago

gmhhope commented 3 years ago

Hi Johannes,

Thanks very much for the continuous supports in this github.

I would like to ask if there is a easy way to figure out which is the outlier showing here in the plot attached?

Thanks, Minghao Gong alignment_res_bpis_adj_AdjustedRtime.pdf

jorainer commented 3 years ago

Actually, there is no easy way to do that. I would suggest to use a different color for one of the samples at a time to see which one it is. actually, you used already a different color for the "outlier" - so I guess you know already what it is... and in order to see if this is really a problematic sample I would suggest you plot EICs of some compounds you know are in the sample before and after alignment and check how the data looks for this sample.

gmhhope commented 3 years ago

Thanks!

jorainer commented 3 years ago

I'm closing this issue now - feel free to re-open if needed.

stanstrup commented 1 year ago

We always have this issue and today I finally made a ggplot2 version that can be passed to plotly so you get an interactive plot. The interactive plot hover info gives all data in pData by default.

rt_pos <- gplotAdjustedRtime(xcms_p_POS_g_r, Plate) + scale_color_brewer(type = "qual", palette = "Set1")
plotly::ggplotly(rt_pos) # slow!

retcor_neg_gg

gplotAdjustedRtime <- function(object, color_by, include_columns = NULL, adjustedRtime = TRUE){

  require(tibble)
  require(tidyr)
  require(dplyr)
  require(purrr)
  require(xcms)

  if (!is(object, "XCMSnExp"))
    stop("'object' has to be an 'XCMSnExp' object.")
  if (!hasAdjustedRtime(object))
    warning("No alignment/retention time correction results present.")

  # get raw and adjusted RTs
  rt_noadj <- rtime(object, adjusted = FALSE, bySample = FALSE) %>% 
    {tibble(names = names(.), rt = .)} %>% 
    separate(col = "names", sep = "\\.", into = c("fromFile", "spectrum"))

  rt_adj <- rtime(object, adjusted = adjustedRtime, bySample = FALSE) %>% 
    {tibble(names = names(.), rt = .)} %>% 
    separate(col = "names", sep = "\\.", into = c("fromFile", "spectrum"))

  from_files <- fromFile(object) %>% names %>% unique %>% gsub("F(.*)\\..*","\\1",.) %>% unique %>% as.numeric()

  # put together and make long format
  rts <- bind_rows(raw = rt_noadj, adjusted = rt_adj, .id = "adjusted") %>% 
    pivot_wider(id_cols = c("fromFile", "spectrum"), names_from = adjusted, values_from = rt) %>% 
    mutate(fromFile = as.integer(gsub("^F(.*)", "\\1", fromFile)))# %>% 
    #mutate(fromFile = fromFile - min(fromFile)+1) # needed because features might not start from 1 if it is a split object

  # add metadata
  rts <- pData(object) %>% mutate(fromFile = from_files) %>% right_join(rts, by = "fromFile", multiple = "all")

  # figure where to find the samples that were used for RT correction
  which_is_groups <- object %>% processHistory %>% map(processParam) %>%  map_lgl(is, "PeakGroupsParam") %>% which

  # extract table of good peaks with with raw and adjusted rt
  subset_selected <- object %>% 
    processHistory %>% 
    map(processParam) %>% 
    pluck(which_is_groups)

  subset_selected <- subset_selected@subset

  files <- from_files %>% {.[subset_selected]}

  pkGroup <- object %>% 
    processHistory %>% 
    map(processParam) %>% 
    pluck(which_is_groups) %>% 
    peakGroupsMatrix %>% 
    as.data.frame %>% 
    setNames(., files) %>% 
    rownames_to_column("feature") %>% 
    as_tibble %>% 
    pivot_longer(-feature, names_to = "fromFile", values_to = "rtime") %>% 
    mutate(fromFile = as.integer(fromFile)) %>% 
    group_by(fromFile) %>%
    group_nest(.key = "feature")

  good_peaks  <- rts %>% 
    select(fromFile, raw, adjusted) %>% 
    group_by(fromFile) %>% 
    group_nest(.key = "correction") %>% 
    right_join(pkGroup, by = "fromFile") %>% 
    mutate(feature_correct = map2(feature, correction, ~  tibble(adjusted = xcms:::.applyRtAdjustment( ..1$rtime, ..2$raw, ..2$adjusted)                 ))) %>% 
    select(-correction) %>% 
    unnest(cols = c(feature, feature_correct)) %>% 
    dplyr::rename(raw = rtime) %>% 
    filter(!is.na(raw))

  # we add the text property that plotly can use to add more info to each point.  
  if(is.null(include_columns)){
    include_columns <- colnames(pData(object))
  }

  rts <- rts %>% 
    select(all_of(include_columns)) %>% 
    imap_dfr(~ paste(.y, .x, sep = ": ")) %>%
    unite(text, sep = "<br>") %>% 
    bind_cols(rts,.)

  # the plotting
  p <- ggplot(data = rts, aes(x = adjusted, y = adjusted-raw, group = fromFile, color = {{color_by}}, text = text)) +
    geom_line() +
    theme_bw() +
    geom_point(data = good_peaks, aes(x = adjusted, y = adjusted-raw), inherit.aes = FALSE, color = "grey", shape = 1) +
    geom_line(data = good_peaks, aes(x = adjusted, y = adjusted-raw, group = feature), inherit.aes = FALSE, color = "grey", linetype = 2) #+
  # xlab(ifelse(adjustedRtime, yes = expression(rt[adj]), no = expression(rt[raw]))) + # <--- plotmath breaks plotly
  # ylab(expression(rt[adj]-rt[raw]))

  return(p)

}
stanstrup commented 1 year ago

Perhaps this should be re-opened so we can discuss if the above function should be implmeented? @jorainer

jorainer commented 1 year ago

Yes @stanstrup , I'm OK re-opening the issue and also to have ggplot-versions of the visualizations. What I would however avoid is to have dependencies on (too many) tidyverse packages within xcms. Maybe have such visualizations in a separate package? ggxcms or xcmsVis (along the lines with the SpectraVis package that contains ggplot visualizations for Spectra).

@sneumann , any opinion?

sneumann commented 1 year ago

Hm, if there was a, say, xcms 4.0 that (also) has classes depending on Spectra and friends, could the desired function be added to SpectraVis and handle the xcms output ? Then we'd neither need more dependencies in xcms, nor a new package. If that is not an option, I'd happily support the idea of xcmsVis, which would probably be that set of nice visualisation functions, ideally also such that inclusion into shiny apps becomes simpler. Yours, Steffen

jorainer commented 1 year ago

yes, xcms 4 has then classes that base on Spectra, only, these visualizations we have in xcms are based on the either chrom peak matrix, adjusted retention times or feature definition data frame, so all results that are not in Spectra. Of course, any visualization using Spectra should go into SpectraVis - but here we have mostly xcms-specific plots. So IMHO it would make most sense to have a xcmsVis package (keep dependencies in xcms low and still provide advanced visualizations and eventually shiny things for xcms).