Closed mgacc0 closed 7 years ago
It crossed my mind to implement these, however, here is why I did not in the 1st go round:
%[]o[]%
notation too long (almost looks like a TIE fighter), but it is kind of unavoidable.Out of these, point 3 is the most important, the other 2 are just stylistic/technical details. Do you have any suggestions on point 3?
The symmetric cases could be shortened, e.g. %[]o[]%
=%[o]%
, %()o()%
=%(o)%
.
The notation would be a bit long but, anyway, is intuitive (and quite equivalent to the mathematical notation). So that would make it easy to adopt.
They are 16 functions. (I don't know why I wrote 32 previously...)
In administrative data processing we use a lot of datetime ranges. And it's very usual to write them as: "from 01.12.2016 to 08.12.2016", meaning the semiopen interval ["2016-12-01 00:00:00", "2016-12-08 00:00:00").
It would be a bit tedious:
I propose something like this:
.intrval3 <-
function(interval1, type1, interval2, type2)
{
ab <- .get_intrval(interval1)
cd <- .get_intrval(interval2)
type1 <- match.arg(type1, c("[]", "[)", "(]", "()"))
type2 <- match.arg(type2, c("[]", "[)", "(]", "()"))
if (ab$a > cd$a) {
tmp <- ab
ab <- cd
cd <- tmp
tmp <- type1
type1 <- type2
type2 <- tmp
}
! .lssthan(ab$b, cd,
ifelse(substr(type1, 2L, 2L)=="]" & substr(type2, 1L, 1L)=="[",
"[", "("))
}
.intrval3(c(1, 2), "()", c(2, 3), "()")
# [1] FALSE
.intrval3(c(1, 2), "[]", c(2, 3), "[)")
# [1] TRUE
OK, I will make a branch and start development with your proposal.
@mgacc0 see here what I have so far in overlap-operators
branch: https://github.com/psolymos/intrval/blob/overlap-operators/extras/playground.R
@mgacc0 I could test your function and it passed all the comparisons I came up with. The only issue right now is that it works for simple case but not yet vectorized (i.e. cannot accept matrices, lists because of the if (iv1$a > iv2$a) {...}
statement).
Firstly, a necessary correction: checking for empty intervals.
is_empty_interval <- function(interval, type) {
interval[[1]]==interval[[2]] & type=="()"
}
overlaps <- function (interval1, interval2, type1, type2)
{
stopifnot(is.vector(interval1) && length(interval1)<=2)
stopifnot(is.vector(interval2) && length(interval2)<=2)
iv1 <- .get_intrval(interval1)
iv2 <- .get_intrval(interval2)
type1 <- match.arg(type1, c("[]", "[)", "(]", "()"))
type2 <- match.arg(type2, c("[]", "[)", "(]", "()"))
if (is_empty_interval(iv1, type1) | is_empty_interval(iv2, type2)) {
return(FALSE)
}
if (iv1$a > iv2$a) {
tmp <- iv1
iv1 <- iv2
iv2 <- tmp
tmp <- type1
type1 <- type2
type2 <- tmp
}
!.lssthan(iv1$b, iv2,
ifelse(substr(type1, 2L, 2L)=="]" && substr(type2, 1L, 1L)=="[",
"[", "("))
}
I would rename the .intrval3
function to overlaps
and make it visible.
For easier testing, I would recommend the testthat package:
library(testthat)
expect_false(overlaps(c(2, 4), c(2, 2), "[]", "()"))
expect_false(overlaps(c(2, 2), c(2, 4), "()", "[]"))
For calling over dataframes, you could do something like this:
library(dplyr, purrr)
mm <- expand.grid(
iv1_a=c(1, 2, 3, 4, 5),
iv1_b=c(1, 2, 3, 4, 5),
iv2_a=c(1, 2, 3, 4, 5),
iv2_b=c(1, 2, 3, 4, 5),
type1=c("[]", "[)", "(]", "()"),
type2=c("[]", "[)", "(]", "()"),
stringsAsFactors=FALSE)
nrow(mm)
# [1] 10000
mm %>%
sample_n(4)
# iv1_a iv1_b iv2_a iv2_b type1 type2
# 694 4 4 3 1 [) []
# 612 2 3 5 5 [] []
# 1320 5 4 3 1 (] []
# 6273 3 5 1 1 (] (]
mm %>%
sample_n(10) %>%
by_row(~ overlaps(c(.[[1]], .[[2]]), c(.[[3]], .[[4]]),
.[[5]], .[[6]]), .collate="cols")
# # A tibble: 10 × 7
# iv1_a iv1_b iv2_a iv2_b type1 type2 .out
# <dbl> <dbl> <dbl> <dbl> <chr> <chr> <lgl>
# 1 5 2 2 2 (] [] TRUE
# 2 1 4 4 5 () () FALSE
# 3 3 5 2 5 () [] TRUE
# 4 4 2 4 3 (] (] TRUE
# 5 5 3 2 1 (] (] FALSE
# 6 2 3 2 4 (] [] TRUE
# 7 2 1 4 2 () [) FALSE
# 8 5 2 1 4 (] (] TRUE
# 9 4 1 2 5 [) [] TRUE
# 10 4 2 3 5 (] [) TRUE
mm %>%
.[, 1:4] %>%
sample_n(10) %>%
by_row(~ c(.[[1]], .[[2]]) %[]o[]% c(.[[3]], .[[4]]), .collate="cols")
# # A tibble: 10 × 5
# iv1_a iv1_b iv2_a iv2_b .out
# <dbl> <dbl> <dbl> <dbl> <lgl>
# 1 5 3 3 4 TRUE
# 2 3 1 2 2 TRUE
# 3 3 2 5 4 FALSE
# 4 3 2 2 4 TRUE
# 5 2 3 4 2 TRUE
# 6 3 4 2 5 TRUE
# 7 3 3 3 3 TRUE
# 8 3 3 4 1 TRUE
# 9 1 5 2 3 TRUE
# 10 2 5 2 4 TRUE
The notation for single vectors (intervals) is intuitive.
But I think that the intrval
library doesn't need to provide an operator for data.frames.
You could place an example (on the README) of using the function purrr::by_row
and overlaps
.
Anyway, if you want to know how a function for data.frames would be:
intrval4 <- function (df1)
{
stopifnot(is.data.frame(df1))
stopifnot(nrow(df1)==1)
iv1 <- .get_intrval(c(df1$iv1_a, df1$iv1_b))
iv2 <- .get_intrval(c(df1$iv2_a, df1$iv2_b))
type1 <- match.arg(df1$type1, c("[]", "[)", "(]", "()"))
type2 <- match.arg(df1$type2, c("[]", "[)", "(]", "()"))
if (is_empty_interval(iv1, type1) | is_empty_interval(iv1, type1)) {
return (FALSE)
}
if (iv1$a > iv2$a) {
tmp <- iv1
iv1 <- iv2
iv2 <- tmp
tmp <- type1
type1 <- type2
type2 <- tmp
}
!.lssthan(iv1$b, iv2,
ifelse(substr(type1, 2L, 2L)=="]" && substr(type2, 1L, 1L)=="[",
"[", "("))
}
mm %>%
sample_n(20) %>%
by_row(intrval4, .collate=c("cols"))
Thanks. I just wanted to make the operators consistent with the %[o]%
cases that can accept different formats on both ends and recycle values if needed. I agree it is not absolutely necessary for all use cases, but still nice to have.
I like your testing approach, but want to avoid too much dependencies. Even if I ignore /tests
for CRAN versions, CI will complain about missing Suggests
declarations.
This function seems to do the job:
.intrval3 <-
function(interval1, interval2, type1, type2)
{
iv1 <- .get_intrval(interval1)
iv2 <- .get_intrval(interval2)
type1 <- match.arg(type1, c("[]", "[)", "(]", "()"))
type2 <- match.arg(type2, c("[]", "[)", "(]", "()"))
b1 <- ifelse(iv1$a < iv2$a, iv1$b, iv2$b)
a2 <- ifelse(iv1$a < iv2$a, iv2$a, iv1$a)
type1v <- ifelse(iv1$a < iv2$a, substr(type1, 2L, 2L), substr(type2, 2L, 2L))
type2v <- ifelse(iv1$a < iv2$a, substr(type2, 1L, 1L), substr(type1, 1L, 1L))
ifelse(type1v == "]" & type2v == "[",
b1 >= a2,
b1 > a2)
}
The idea is similar to your proposal:
Could you have a look at ../pull/13 ?
@mgacc0 I merged the overlap-operators branch, see comments on #15
Next steps include more testing and improving code coverage (it dropped because right now only internals are tested).
It would be really great to implement some new operators for interval-to-interval overlapping.
Instead of having the
%[o]%
operator for checking the overlapping between the closed intervals [a, b] and [x, y] expressed asc(a, b) %[o]% c(x, y)
, it would be great having the%[]o[]%
operator expressed asc(a, b) %[]o[]% c(x, y)
.And then, for example:
Consequently, that would imply creating 32 operators (for all the combinations of open/closed endpoints):
%[]o[]%
%[]o[)%
%[]o(]%
%[]o()%
%[)o[]%
%[)o[)%
%[)o(]%
%[)o()%