Open meowcat opened 5 years ago
As for your relative intensity examples, this should definitely be stores in its own field with sp$relativeIntensity <- normalise(sp)
. An alternative would be to calculate this information on the fly, when as trivial as that example.
In general, there is no problem to use any cardinality, and to relax anything that used to be 1 to something large.
As for peaksCount
(and all other mandatory fields), we can (and do) enforce the return type (integer
in this case). The cardinality is still a (trivial) TODO.
Yes-ish. Again, we have to take care what happens when the peaks change, as in #1.
sp <- Spectrum()
sp[c("intensity", "mz")] <- list(c(100, 136.1, 136,3, 136.9, 200), c(20, 5, 50, 5, 20))
sp$relativeIntensity <- normalise(sp)
sp <- filterPeaks(sp, mz > 100)
# this is an operation I would like to see, which does something like:
filterPeaks <- function(sp, expr)
{
filter <- eval(expr, envir=sp@listData)
sp[c(mz, intensity)] <- sp[c(mz, intensity)][filter == TRUE,,drop=FALSE]
return(sp)
}
# ideally more powerful such that it could also work with annotations.
# But now, sp$relativeIntensity is broken.
This would be simplified for additional fields such as relativeIntensity
if there was something like a class PeakVector
you could use to assign fields which have cardinality peaksCount
:
# this filters all fields which contain "peaktable" data and leaves other stuff alone
filterPeaks <- function(sp, expr)
{
filter <- eval(expr, envir=sp@listData)
sp@listData <- lapply(sp@listData, function(field)
{
if(is(field, "PeakVector"))
field <- field[filter]
return(field)
}
return(sp)
}
Possibly this should be implemented very differently with Spectra
in mind, and yes, perhaps this is a solitary use case. Just bringing up possibilities.
Do we really need relativeIntensity
as a field? I would rather define an accessor method for that and calculate the relative intensities on the fly. Same for peaks count or tic, IMHO it is better to calculate these values on the fly because then there is no longer the need to update/change these fields whenever you do some sort of data manipulations.
I was not arguing that we need relativeIntensity
as a built-in field, but rather that there could be the need for peaksCount
-cardinality user-defined fields.
Regarding relativeIntensity
specifically: it can be convenient because there are different ways to normalize. Some normalize the maximum intensity to 1, some normalize the maximum intensity to 999, others normalize to the sum of intensities.
We have talked a bit about cardinalities of regular attributes and peak annotations before. I want to bring some attention to details so we don't forget them.
We have talked about many things as being cardinality 1. This is strictly true for e.g. polarity, ionization mode. It is already not true anymore for some physical properties of a spectrum such as collision energy (stepped collision energy, for example). It is certainly not true for many metadata items, such as chemical name.
We have talked about peaks [mz, intensity] being of cardinality n (or more precisely,
peaksCount
), and peak annotations being mapped to one or multiple peaks. This is true for the real annotations we have already talked about (like peptide fragments, chemical formula, etc). On the other hand there are attributes that have true cardinalitypeaksCount
. For example, relative intensity. It is often quite useful to carry both absolute and relative intensity in a spectrum. ImagineA typical
sp <- normalize(sp)
would just overwritesp$mz
but even MassBank spectra usually have both (see on the bottom PK$PEAK here, and in processing it is nice to filter for relative and absolute intensity (cutoffs) independently. So imaginesp$relativeIntensity <- normalize(sp)
or something like that.We could represent
relativeIntensity
as aPeakAnnotation
. But do we want this, or do we want to explicitely allow and possibly markpeaksCount
-cardinality attributes?If we allow this, how do we deter people from abusing
peaksCount
-cardinality attributes for what we would want to usePeakAnnotation
for? (e.g. aformula
attribute that is mostlyNA
)?(As a vague idea how to implement this, we would have a
PeakVector
class that is used formz
,intensity
and user-defined fields of this nature. I wonder if this could help to solve the mapping issue from #1.)