Cardinalities - Githubissues

meowcat commented 5 years ago

We have talked a bit about cardinalities of regular attributes and peak annotations before. I want to bring some attention to details so we don't forget them.

We have talked about many things as being cardinality 1. This is strictly true for e.g. polarity, ionization mode. It is already not true anymore for some physical properties of a spectrum such as collision energy (stepped collision energy, for example). It is certainly not true for many metadata items, such as chemical name.
We have talked about peaks [mz, intensity] being of cardinality n (or more precisely, peaksCount), and peak annotations being mapped to one or multiple peaks. This is true for the real annotations we have already talked about (like peptide fragments, chemical formula, etc). On the other hand there are attributes that have true cardinality peaksCount. For example, relative intensity. It is often quite useful to carry both absolute and relative intensity in a spectrum. Imagine
```
sp <- Spectrum()
sp[c("intensity", "mz")] <- list(c(100, 136.1, 136,3, 136.9, 200), c(20, 5, 50, 5, 20))
```
A typical sp <- normalize(sp) would just overwrite sp$mz but even MassBank spectra usually have both (see on the bottom PK$PEAK here, and in processing it is nice to filter for relative and absolute intensity (cutoffs) independently. So imagine sp$relativeIntensity <- normalize(sp) or something like that.
We could represent relativeIntensity as a PeakAnnotation. But do we want this, or do we want to explicitely allow and possibly mark peaksCount-cardinality attributes?
If we allow this, how do we deter people from abusing peaksCount-cardinality attributes for what we would want to use PeakAnnotation for? (e.g. a formula attribute that is mostly NA)?

(As a vague idea how to implement this, we would have a PeakVector class that is used for mz, intensity and user-defined fields of this nature. I wonder if this could help to solve the mapping issue from #1.)

lgatto commented 5 years ago

As for your relative intensity examples, this should definitely be stores in its own field with sp$relativeIntensity <- normalise(sp). An alternative would be to calculate this information on the fly, when as trivial as that example.

In general, there is no problem to use any cardinality, and to relax anything that used to be 1 to something large.

As for peaksCount (and all other mandatory fields), we can (and do) enforce the return type (integer in this case). The cardinality is still a (trivial) TODO.

meowcat commented 5 years ago

Yes-ish. Again, we have to take care what happens when the peaks change, as in #1.

sp <- Spectrum()
sp[c("intensity", "mz")] <- list(c(100, 136.1, 136,3, 136.9, 200), c(20, 5, 50, 5, 20))
sp$relativeIntensity <- normalise(sp)
sp <- filterPeaks(sp, mz > 100) 
# this is an operation I would like to see, which does something like:
filterPeaks <- function(sp, expr)
{
 filter <- eval(expr, envir=sp@listData)
 sp[c(mz, intensity)] <- sp[c(mz, intensity)][filter == TRUE,,drop=FALSE]
 return(sp)
}
# ideally more powerful such that it could also work with annotations.
# But now, sp$relativeIntensity is broken.

This would be simplified for additional fields such as relativeIntensity if there was something like a class PeakVector you could use to assign fields which have cardinality peaksCount:


# this filters all fields which contain "peaktable" data and leaves other stuff alone
filterPeaks <- function(sp, expr)
{
 filter <- eval(expr, envir=sp@listData)
 sp@listData <- lapply(sp@listData, function(field)
 {
  if(is(field, "PeakVector"))
    field <- field[filter]
  return(field)
 }
 return(sp)
}

Possibly this should be implemented very differently with Spectra in mind, and yes, perhaps this is a solitary use case. Just bringing up possibilities.

jorainer commented 5 years ago

Do we really need relativeIntensity as a field? I would rather define an accessor method for that and calculate the relative intensities on the fly. Same for peaks count or tic, IMHO it is better to calculate these values on the fly because then there is no longer the need to update/change these fields whenever you do some sort of data manipulations.

meowcat commented 5 years ago

I was not arguing that we need relativeIntensity as a built-in field, but rather that there could be the need for peaksCount-cardinality user-defined fields.

Regarding relativeIntensity specifically: it can be convenient because there are different ways to normalize. Some normalize the maximum intensity to 1, some normalize the maximum intensity to 999, others normalize to the sum of intensities.

lgatto / Spectrum

Cardinalities #3