Would it be possible to parallelize lsh_compare? On large corpora the number of comparisons can quickly become very big.
I managed to do this using %dopar% and bind_rows from dplyr, but I assume there are other way to do it as well:
lshc <- function (candidates, corpus, f)
{
num_rows <- nrow(candidates)
bind_rows(
foreach (i=seq_len(num_rows)) %dopar% {
a <- candidates$a[i]
b <- candidates$b[i]
score <- f(corpus[[a]], corpus[[b]])
list(a = a, b = b, score = score)
})
}
Then I noticed that you have already used mclapply in TextReuseCorpus. So, maybe the same can be done for lsh_compare? Let me know if I can help with that.
Would it be possible to parallelize lsh_compare? On large corpora the number of comparisons can quickly become very big.
I managed to do this using
%dopar%
andbind_rows
fromdplyr
, but I assume there are other way to do it as well:Then I noticed that you have already used
mclapply
inTextReuseCorpus
. So, maybe the same can be done forlsh_compare
? Let me know if I can help with that.