wkumler / RaMS

R-based access to Mass-Spectrometry data
Other
20 stars 7 forks source link

Implement DBSCAN/OPTICS as an mz_group option? #34

Open wkumler opened 4 months ago

wkumler commented 4 months ago

Realized today that m/z group construction could be done with a 1D density-based clustering algorithm like DBSCAN or OPTICS. Perks of this would be that the "hard" m/z window currently used by mz_group would be relaxed and could be determined in a more data-driven method.

There's a paper about this exact idea: https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3982975/ and they talk about reducing the computational constraints through some clever preprocessing, necessary because the current implementation takes a long while for just 6 files.

Quick proof-of-concept:

library(RaMS)
ms_filedir <- system.file("extdata", package="RaMS")
ms_files <- list.files(ms_filedir, pattern="LB.*mzML", full.names=TRUE)
msdata <- grabMSdata(ms_files)

library(dbscan)
mz_groups <- dbscan(msdata$MS1[,"mz"], eps = 0.0001, minPts = 100)
msdata$MS1$mz_group <- mz_groups$cluster

library(ggplot2)
msdata$MS1[mz%between%c(110, 130)] %>%
  ggplot() +
  geom_point(aes(x=rt, y=mz, color=factor(mz_group)))
wkumler commented 4 months ago

One big perk of this method is that it would identify/remove a bunch of the "noise" data points that are singular points instead of having to assign them each to an m/z group. min_group_size already kinda does this but not very well(?)