lgatto / Pbase

Manipluating and exploring protein and proteomics data
8 stars 3 forks source link

Parallelise pmapToGenome #20

Open lgatto opened 9 years ago

lgatto commented 9 years ago

Using BiocParallel. New signature should be

setMethod("mapToGenome", c("Proteins", "GRangesList"),
          function(x, genome, drop.empty.ranges = TRUE, ..., BPPARM))
jorainer commented 8 years ago

Suggestion:

setGeneric("pmapToGenome2",
           function(x, genome, ...) standardGeneric("pmapToGenome"))
setMethod("pmapToGenome2", c("Proteins", "GRangesList"),
          function(x, genome, drop.empty.ranges = TRUE, ...) {
              if (length(x) != length(genome))
                  stop("'x' and 'genome' must have the same length")

              l <- bpmapply(split(x, 1:length(x)), genome,
                              FUN = tryCatchMapToGenome, ...)
              ans <- GRangesList(l)
              if (drop.empty.ranges)
                  ans <- ans[elementNROWS(ans) > 0]
              if (validObject(ans))
                  return(ans)
          })

Didn't try that one yet, but should work. Need also to evaluate performance on that.

jorainer commented 8 years ago

Actually, performance doesn't look that promising (at least for few proteins, 36 in the example below):

library(BiocParallel)
>     register(MulticoreParam(workers = 2))
>     microbenchmark(pmapToGenome(prts, cdss),
+                    Pbase:::pmapToGenome2(prts, cdss),
+                    times = 20)
Unit: seconds
                              expr      min       lq     mean   median       uq
          pmapToGenome(prts, cdss) 1.348400 1.370049 1.520549 1.453795 1.502679
 Pbase:::pmapToGenome2(prts, cdss) 4.108021 4.148563 4.252193 4.163558 4.213703
      max neval cld
 2.218503    20  a 
 5.573078    20   b
lgatto commented 8 years ago

It would be good to test this on larger, real-life sized data, and see how these scale.