Closed kevinykuo closed 6 years ago
Each bignum is already a raw vector under the hood, so it cannot easily be vectorized over multiple bignums. We would have to introduce a new datastructure for lists of bignums, which would not give you much performance gain.
Can you illustrate with some code what it is you would like to do exactly?
@jeroen Thanks for the reply, that makes sense. I'm basically doing something like this:
string_to_hash_bucket <- function(x, num_buckets) {
r <- sapply(x, function(s) bignum(md5(s), hex = TRUE) %% bignum(num_buckets))
as.integer(r)
}
v <- sample(letters, 100, replace = TRUE)
string_to_hash_bucket(v, 10)
# [1] 9 3 9 1 8 1 3 7 3 7 7 3 5 6 7 8 9 3 9 6 3 3 5 3 1 7 3 3 6 3 8 8 3 3 3 9
# [37] 7 1 6 6 9 1 6 9 1 8 8 8 6 3 5 5 5 3 3 9 5 7 3 1 5 6 6 3 8 3 8 6 5 5 8 8
# [73] 9 6 8 8 3 3 8 3 5 3 7 3 8 8 8 3 7 9 7 3 8 3 1 8 1 9 9 5
Got a somewhat off-label use case here. I'm trying to encode large cardinality categorical features (that come as character columns) into hash buckets. This involves calling
bignum()
then%%
on the hash output. Would it be possible to vectorizebignum()
to make this more efficient or is there a better way I overlooked?