Open leonjessen opened 3 years ago
Hi @leonjessen,
There is no option within the function to vectorise or multithread this operation, but if you have many sequences to score you can set up a cluster using the parallel
package. For example if you are only interested in the log odds scores of the sequences given the model would be:
score <- function(s, model) aphid::Viterbi(model, s, odds = TRUE)$score
cl <- parallel::makeCluster(2)
scores <- parallel::parSapply(cl, dat_train_bin, score, model = phmm_trained)
parallel::stopCluster(cl)
If you are lookinging for the full output of Viterbi.PHMM
for each sequence you can remove the $score
from the first line and substitute parSapply
for parLapply
in which case the result will be a list of lists..
Hope that helps!
Cheers, Shaun
Hi @shaunpwilkinson,
I'm wondering if there is a vectorised version of
Viterbi.PHMM()
?Reproducible example below illustrates how
Viterbi.PHMM()
can only score one sequence at a time. This leads to substantial computation time, when scoring e.g. 500,000 sequences. Sinceaphid
isc++
based, I was wondering if a vectorised version ofViterbi.PHMM()
is available to alleviate this?