hansenlab / yamss

3 stars 0 forks source link

Calling peaks on generic 2D data #12

Open jamesdalg opened 6 years ago

jamesdalg commented 6 years ago

Following this post (https://support.bioconductor.org/p/106565/), is it possible to call peaks on a regular 2D matrix in R using yamss? If so, how would one go about that?

lmyint commented 6 years ago

I'm not quite clear on what the peaks are in your application. What numbers are in the cells of your 2D matrix?

It may be that you are trying to perform 2D smoothing, which is not what yamss does. In other words, is your data a matrix of x and y locations with an associated z dimension for height? If so, yamss isn't going to do what you need.

On Mon, Mar 12, 2018 at 3:11 PM, jamesdalg notifications@github.com wrote:

Following this post (https://support.bioconductor.org/p/106565/), is it possible to call peaks on a regular 2D matrix in R using yamss? If so, how would one go about that?

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/hansenlab/yamss/issues/12, or mute the thread https://github.com/notifications/unsubscribe-auth/AFILlNVD9mT2aFr2pJnHPz2zCTKsyuLeks5tdshsgaJpZM4SnOEE .

-- Leslie Myint PhD candidate - Biostatistics Johns Hopkins Bloomberg School of Public Health

lmyint commented 6 years ago

It is possible to phrase your question in terms of density estimation though. I've tried to illustrate with the following toy example:

library(yamss)
library(data.table)

set.seed(4)
num_rows <- 1000 # "m/z"
num_cols <- 2000 # "scan"
mat <- matrix(runif(num_rows*num_cols, 3, 5), nrow = num_rows, ncol = num_cols) # "intensities"

dt <- data.table(
    mz = rep(seq_len(num_rows), num_cols),
    scan = rep(seq_len(num_cols), each = num_rows),
    intensity = as.numeric(mat),
    sample = 1
)
cms_raw <- new("CMSraw")
yamss:::.mzParams(cms_raw) <- yamss:::.setMZParams(dt)
yamss:::.rawDT(cms_raw) <- dt
colData(cms_raw) <- DataFrame(sample = 1)
cms_proc <- bakedpi(cms_raw, dbandwidth = c(1e-5, 1), dgridstep = c(1e-5, 1))
cms_slice <- slicepi(cms_proc, cutoff = NULL, verbose = TRUE)
jamesdalg commented 6 years ago

That's awesome! Thank you!

jamesdalg commented 6 years ago

I have a reasonably fast laptop (core i7, 7th gen) and I think that particular example is perhaps a little large. I attempted to scale it down a bit to something a bit smaller for testing, but I keep hitting errors.

num_rows <- 200 # "m/z" num_cols <- 400 # "scan" mat <- matrix(runif(num_rows*num_cols, 3, 5), nrow = num_rows, ncol = num_cols) # "intensities"

dt <- data.table(

  • mz = rep(seq_len(num_rows), num_cols),
  • scan = rep(seq_len(num_cols), each = num_rows),
  • intensity = as.numeric(mat),
  • sample = 1
  • ) cms_raw <- new("CMSraw") yamss:::.mzParams(cms_raw) <- yamss:::.setMZParams(dt) yamss:::.rawDT(cms_raw) <- dt colData(cms_raw) <- DataFrame(sample = 1) cms_proc <- bakedpi(cms_raw, dbandwidth = c(1e-5, 1), dgridstep = c(1e-5, 1)) [bakedpi] Background correction Error in simpleLoess(y, x, w, span, degree = degree, parametric = parametric, : span is too small cms_proc <- bakedpi(cms_raw, dbandwidth = c(1e-2, 1), dgridstep = c(1e-2, 1)) [bakedpi] Background correction Error in simpleLoess(y, x, w, span, degree = degree, parametric = parametric, : span is too small cms_proc <- bakedpi(cms_raw, dbandwidth = c(1e-1, 1), dgridstep = c(1e-1, 1)) [bakedpi] Background correction Error in simpleLoess(y, x, w, span, degree = degree, parametric = parametric, : span is too small cms_proc <- bakedpi(cms_raw, dbandwidth = c(-1, 1), dgridstep = c(-1, 1)) Error: vec[1] > 0 is not TRUE Is there a way to make this work on something about 10x smaller? I've tried every set of values I can think of to solve this.
lmyint commented 6 years ago

Unfortunately, when I tried smaller matrices around what you tried, I ran into the same errors. This is due to other aspects of processing in yamss; in particular, there is a background correction step that requires a sufficient number of rows.

You need to keep dgridstep = c(1e-5, 1). This ensures that the grid on which the kernel density is computed actually lines up with your matrix. You could try increasing your bandwidth to something like dbandwidth = c(3e-5, 2).

Perhaps also modify the code creating the matrix to make it more sparse. Currently it's a dense matrix of uniformly distributed numbers. You could try to make it more sparse by starting first with a matrix of zeros and filling in squares of random nonzero numbers. Then after the data.table creation step, subset the data.table to remove the rows corresponding to the matrix cells with "intensity" equal to zero.

Lastly, perhaps the code runs better if you subset your actual matrix?

On Tue, Mar 13, 2018 at 10:19 AM, jamesdalg notifications@github.com wrote:

I have a reasonably fast laptop (core i7, 7th gen) and I think that particular example is perhaps a little large. I attempted to scale it down a bit to something a bit smaller for testing, but I keep hitting errors.

num_rows <- 200 # "m/z" num_cols <- 400 # "scan" mat <- matrix(runif(num_rows*num_cols, 3, 5), nrow = num_rows, ncol = num_cols) # "intensities"

dt <- data.table(

  • mz = rep(seq_len(num_rows), num_cols),
  • scan = rep(seq_len(num_cols), each = num_rows),
  • intensity = as.numeric(mat),
  • sample = 1
  • )

cms_raw <- new("CMSraw") yamss:::.mzParams(cms_raw) <- yamss:::.setMZParams(dt) yamss:::.rawDT(cms_raw) <- dt colData(cms_raw) <- DataFrame(sample = 1) cms_proc <- bakedpi(cms_raw, dbandwidth = c(1e-5, 1), dgridstep = c(1e-5, 1)) [bakedpi] Background correction Error in simpleLoess(y, x, w, span, degree = degree, parametric = parametric, : span is too small cms_proc <- bakedpi(cms_raw, dbandwidth = c(1e-2, 1), dgridstep = c(1e-2, 1)) [bakedpi] Background correction Error in simpleLoess(y, x, w, span, degree = degree, parametric = parametric, : span is too small cms_proc <- bakedpi(cms_raw, dbandwidth = c(1e-1, 1), dgridstep = c(1e-1, 1)) [bakedpi] Background correction Error in simpleLoess(y, x, w, span, degree = degree, parametric = parametric, : span is too small cms_proc <- bakedpi(cms_raw, dbandwidth = c(-1, 1), dgridstep = c(-1, 1)) Error: vec[1] > 0 is not TRUE Is there a way to make this work on something about 10x smaller? I've tried every set of values I can think of to solve this.

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/hansenlab/yamss/issues/12#issuecomment-372681063, or mute the thread https://github.com/notifications/unsubscribe-auth/AFILlGRMcbzETZPoaKJN1X_U3xxYlrxDks5td9VagaJpZM4SnOEE .

-- Leslie Myint PhD candidate - Biostatistics Johns Hopkins Bloomberg School of Public Health