Closed moodymudskipper closed 4 years ago
This passes all tests, is simpler, and works better with examples above, but is slower :
library(rangeops)
#>
#> Attaching package: 'rangeops'
#> The following object is masked from 'package:base':
#>
#> <<-
`%in{}2%` <- function(x, table){
if(is.language(table)) return(x == table)
Reduce(`|`, lapply(table, `==`, x))
}
set.seed(42)
x_ <- sample(1000,100)
table_ <- sample(1000,100)
table2_ <- as.list(table_)
x2_ <- as.list(x_)
intersect(x_, table_)
#> [1] 410 930 989 299 348 504 945 262 669 630
microbenchmark::microbenchmark(
x_ %in% table_,
x_ %in{}% table_,
x_ %in{}2% table_
)
#> Unit: microseconds
#> expr min lq mean median uq max neval cld
#> x_ %in% table_ 2.1 2.65 3.786 3.0 3.60 18.5 100 a
#> x_ %in{}% table_ 5.2 6.40 13.069 7.7 8.70 403.0 100 a
#> x_ %in{}2% table_ 111.5 129.70 184.133 142.0 180.95 2633.8 100 b
microbenchmark::microbenchmark(
x2_ %in% table2_,
x2_ %in{}% table2_,
x2_ %in{}2% table2_
)
#> Unit: microseconds
#> expr min lq mean median uq max neval cld
#> x2_ %in% table2_ 162.6 168.80 184.088 171.00 174.90 477.6 100 b
#> x2_ %in{}% table2_ 7.4 8.50 11.936 9.30 10.85 99.4 100 a
#> x2_ %in{}2% table2_ 164.0 167.15 191.750 175.25 203.40 392.5 100 b
microbenchmark::microbenchmark(
x_ %in% table2_,
x_ %in{}% table2_,
x_ %in{}2% table2_
)
#> Unit: microseconds
#> expr min lq mean median uq max neval cld
#> x_ %in% table2_ 95.0 96.55 100.216 97.80 99.45 206.1 100 b
#> x_ %in{}% table2_ 6.1 6.95 8.618 7.55 8.10 42.6 100 a
#> x_ %in{}2% table2_ 104.1 105.50 113.780 107.35 110.35 267.8 100 c
microbenchmark::microbenchmark(
x2_ %in% table_,
x2_ %in{}% table_,
x2_ %in{}2% table_
)
#> Unit: microseconds
#> expr min lq mean median uq max neval cld
#> x2_ %in% table_ 100.2 101.50 143.410 102.40 103.80 531.0 100 b
#> x2_ %in{}% table_ 6.2 7.05 11.874 8.10 8.60 68.7 100 a
#> x2_ %in{}2% table_ 166.6 169.15 240.365 171.15 174.35 916.7 100 c
Created on 2019-10-31 by the reprex package (v0.3.0)
In favor of new implementation:
==
built in%in{}%
became complicatedIn favor of previous :
note :
"a" %in{}% list("b", factor("a"))
is FALSE
but "a" %in{}% list(factor("a"))
is TRUE
, this cannot be acceptable.
with %in{}2%
they are both TRUE, while "a" == list(factor("a"))
is FALSE, but considering %in{}%
should be seen as applying ==
on ELEMENTS of the rhs I believe this is OK.
I think the short and slow version might be the way to go to have something robust, and if speed becomes an issue we can design a bunch of unit tests (and benchmarks) and propose something better.
went with simple version with https://github.com/moodymudskipper/rangeops/commit/35e983be11b02c4616f1fd042bed52e784703caa