lgatto / Spectrum

Spectrum Infrastructure for Mass Spectrometry Data
2 stars 1 forks source link

Cardinalities #3

Open meowcat opened 5 years ago

meowcat commented 5 years ago

We have talked a bit about cardinalities of regular attributes and peak annotations before. I want to bring some attention to details so we don't forget them.

(As a vague idea how to implement this, we would have a PeakVector class that is used for mz, intensity and user-defined fields of this nature. I wonder if this could help to solve the mapping issue from #1.)

lgatto commented 5 years ago

As for your relative intensity examples, this should definitely be stores in its own field with sp$relativeIntensity <- normalise(sp). An alternative would be to calculate this information on the fly, when as trivial as that example.

In general, there is no problem to use any cardinality, and to relax anything that used to be 1 to something large.

As for peaksCount (and all other mandatory fields), we can (and do) enforce the return type (integer in this case). The cardinality is still a (trivial) TODO.

meowcat commented 5 years ago

Yes-ish. Again, we have to take care what happens when the peaks change, as in #1.

sp <- Spectrum()
sp[c("intensity", "mz")] <- list(c(100, 136.1, 136,3, 136.9, 200), c(20, 5, 50, 5, 20))
sp$relativeIntensity <- normalise(sp)
sp <- filterPeaks(sp, mz > 100) 
# this is an operation I would like to see, which does something like:
filterPeaks <- function(sp, expr)
{
 filter <- eval(expr, envir=sp@listData)
 sp[c(mz, intensity)] <- sp[c(mz, intensity)][filter == TRUE,,drop=FALSE]
 return(sp)
}
# ideally more powerful such that it could also work with annotations.
# But now, sp$relativeIntensity is broken.

This would be simplified for additional fields such as relativeIntensity if there was something like a class PeakVector you could use to assign fields which have cardinality peaksCount:


# this filters all fields which contain "peaktable" data and leaves other stuff alone
filterPeaks <- function(sp, expr)
{
 filter <- eval(expr, envir=sp@listData)
 sp@listData <- lapply(sp@listData, function(field)
 {
  if(is(field, "PeakVector"))
    field <- field[filter]
  return(field)
 }
 return(sp)
}

Possibly this should be implemented very differently with Spectra in mind, and yes, perhaps this is a solitary use case. Just bringing up possibilities.

jorainer commented 5 years ago

Do we really need relativeIntensity as a field? I would rather define an accessor method for that and calculate the relative intensities on the fly. Same for peaks count or tic, IMHO it is better to calculate these values on the fly because then there is no longer the need to update/change these fields whenever you do some sort of data manipulations.

meowcat commented 5 years ago

I was not arguing that we need relativeIntensity as a built-in field, but rather that there could be the need for peaksCount-cardinality user-defined fields.

Regarding relativeIntensity specifically: it can be convenient because there are different ways to normalize. Some normalize the maximum intensity to 1, some normalize the maximum intensity to 999, others normalize to the sum of intensities.