Closed lgatto closed 9 years ago
You are right, it is confusing and contains at least one bug.
The first lines check the arguments and ensure that the order of the pattern
(peptides) is the same as in the subject
(proteins; mabye we can skip the ordering step? or implement a general function?): ll. 19-41
The crucial step is in ll. 43-46. Here I check whether the peptides is within a protein (I can't remember why I have implemented this check. It should ensure that the peptides sequence is not longer than the protein sequence. Could this really happen? Maybe this is completley superfluous.)
While this seems useless it reveals a bug in .splitIRanges
in combination with the argument unshift = TRUE
. unshift = TRUE
assumes an ordered IRangesList
(which is not the case in our example). I also can't remember why I implemented the unshift = TRUE
option. By using unshift = TRUE
we get some IRanges
that start at one, e.g.:
ir <- IRanges(start = c(1, 6), end = c(5, 10))
ir
# IRanges of length 2
# start end width
# [1] 1 5 5
# [2] 6 10 5
Pbase:::.splitIRanges(ir, f = c("a", "b"), unshift = FALSE)
# IRangesList of length 2
# $a
# IRanges of length 1
# start end width
# [1] 1 5 5
#
# $b
# IRanges of length 1
# start end width
# [1] 6 10 5
Pbase:::.splitIRanges(ir, f = c("a", "b"), unshift = TRUE)
# IRangesList of length 2
# $a
# IRanges of length 1
# start end width
# [1] 1 5 5
#
# $b
# IRanges of length 1
# start end width
# [1] 1 5 5
The last lines ll. 48-51 are exactly the same as the lines you suggested.
AFAIK we don't use the unshift
option anywhere. If we avoid the check whether a peptide is in its corresponding protein we can remove the unshift
option completely and make the code more readable. Otherwise I need to fix .splitIRanges
. What do you think? I tend to remove the unshift
argument.
Unless I miss something:
Proteins
class: pfeature()[i]
come from aa()[i]
; Proteins
object should not be valid.Hence, my suggestion would be
proteinCoverage <- function(x) {
stopifnot(is(x, "Proteins"))
prtl <- width(aa(x))
pepl <- sapply(pranges(x), function(xx) sum(width(reduce(xx))))
cvg <- pepl/prtl
addacol(x, "Coverage", cvg)
}
(which is by the way very slow - probably the sapply
over the pranges(x)
)
Unless I miss something:
- we should have any issues with the order - this is guaranteed be the Proteins class: pfeature()[i] come from aa()[i];
- peptide are expected to be in their respective protein. If there were not, then the Proteins object should not be valid.
Ok I will fix this.
Just saw a typo - I meant
I have some issues with
proteinCoverage
. I find the implementation a bit confusion and can't figure out why I end up withNA
.but
O75369
looks fine to meA naive implementation gives the following, which matches
proteinCoverage
ommittingNA
s@sgibb, could you clarify what happens and why?