Closed lgatto closed 8 years ago
Mh, I am not convinced.
E.g. in data.frame
[
and $
are similar (the latter "supports" pmatch
), both access a column:
d <- data.frame(filename=letters[1:3], value=1:3)
d["filename"]
d$filename
In a Protein object [
creates a subset and $
would access/modify metadata. Wouldn't this be counterintuitive?
For the mentioned question I would suggest to add an addIdentificationData
method for mzID
objects or a data.frame
(as we have done in MSnbase).
In a Protein object [ creates a subset and $ would access/modify metadata. Wouldn't this be counterintuitive?
Yes, I suppose so. The idea stems from pcols(x)
being a CompressedSplitDataFrameList
.
For the mentioned question I would suggest to add an addIdentificationData method for mzID objects or a data.frame (as we have done in MSnbase).
I can't remember if the MSnSet
output of MSnID
(let's call it x
) has all the metadata that was originally in the mzID files. We could add identification data from fData(x)
, but not sure if ideal.
Now, by adding all the mzID files, OP has a Proteins object with all the peptides. My idea was to subset the pranges(p)
with something like:
p$delta <- p$experimentalMassToCharge - p$calculatedMassToCharge
sel <- abs(p$delta) < 0.35
pranges(p) <- pranges(p)[sel]
Well, the last line is just an illustration, of course, just to give you an idea. Maybe we need a dedicated function for that, possibly with non-standard evaluation:
p2 <- subset(p, abs(delta) < 0.035)
Ok, now I understand want you (or the OP) want to achieve. I don't think we need any new subsetting here. The subset operator [
of DataFrameList
supports a special matrix-like subsetting syntax ([i, j]
; see ?DataFrameList
for details): by setting i
to missing you can loop over the list (which is not possible with classic R lists):
library("Pbase")
data(p)
## stupid example to determine the length of a seqence
delta <- pcols(p)[, "end"] - pcols(p)[, "start"]
# IntegerList of length 9
# [["A4UGR9"]] 17 11 12 9 15 12 12 9 18 15 12 8 ... 12 14 15 22 13 12 11 17 13 12 13
# [["A6H8Y1"]] 17 7 8 17 7 17 16 11 16 17 10 20 12 20 25 9 9 16 16 19 9 10 26
# [["O43707"]] 14 17 12 12 15 11
# [["O75369"]] 14 11 15 13 8 10 10 13 23 20 10 12 14
# [["P00558"]] 13 7 17 15 10
# [["P02545"]] 10 17 10 11 15 16 19 16 13 17 16 17
# [["P04075"]] 22 22 19 17 26 12 26 10 27 8 29 7 14 13 21 6 19 26 11 26 21
# [["P04075-2"]] 22 22 19 17 26 12 26 10 27 8 29 7 14 13 21 6 19 26 26 21
# [["P60709"]] 12
## subset by length
pranges(p)[delta > 20]
# IRangesList of length 9
# $A4UGR9
# IRanges of length 1
# start end width names
# [1] 2710 2732 23 A4UGR9
#
# $A6H8Y1
# IRanges of length 2
# start end width names
# [1] 21 46 26 A6H8Y1
# [2] 5 31 27 A6H8Y1
#
# $O43707
# IRanges of length 0
#
# ...
# <6 more elements>
The OP's example should be possible with the following code snippet:
delta <- pcols(p)[, "experimentalMassToCharge"] -
pcols(p)[, "calculatedMassToCharge"]
sel <- abs(delta) < 0.35
pranges(p)[sel]
Using pcol(x)[, "foo"]
is not as easy to type as x$foo
but would not confuse with classical data.frame
subsetting and would be similar to MSnbase
's fData(x)["foo"]
.
But to get what the OP want to do we need a replacement method for pranges<-
(and we could add one for pcols
, too).
Using
pcol(x)[, "foo"]
is not as easy to type asx$foo
but would not confuse with classicaldata.frame
subsetting and would be similar toMSnbase
'sfData(x)["foo"]
.
I always use fData(x)$foo
, but let's forget about the $
for now.
Yes, we only need a pranges<-
replacement method for this. Not sure about pcols
- it is perhaps a confusing to filter on the elements metadata to filter the actual peptide ranges.
Re OP, my idea is that if we have general sub-setting capabilities, that question and many others can be resolved easily.
Yes, we only need a pranges<- replacement method for this. Not sure about pcols - it is perhaps a confusing to filter on the elements metadata to filter the actual peptide ranges.
Maybe pfeatures<-
?
@sgibb - could you have a look at commit 7583f240e90642092937ef72a6a8bfebd235914c.
I think it would be nice to have pcols(x)[, "FOO"] <- x
, to record delta
for example. And I guess something line acols(x)[, "BAR"] <- ...
would also be nice for consistency. What do you thing?
https://github.com/ComputationalProteomicsUnit/Pbase/commit/7583f240e90642092937ef72a6a8bfebd235914c looks fine but we have to ensure that names(pranges) == names(aa)
.
acol<-
would be a good idea, too.
I have added acols<-
. Still need to write tests, though.
closed in bf42895e918328c3c11708852d8a47a9a00fb6ba and 7a2375d2037704535e4c4d435cd66a0fbd250a0d.
Thanks!
@sgibb - what do you think of the following?
Usage:
I am unsure if this is best practice for
CompressedSplitDataFrameList
, though.This is in relation to this question.