Closed ThierryO closed 3 years ago
Thanks for the comments @florisvdh. The differences you get is because internally the split_by
variables are prepended to the sorting
variables. So i your example the actual sorting is c("group2", "group1", "id_sub")
. I'll update the documentation to mention this.
closes #45
Good idea @ThierryO :+1: , thanks for the explanation.
Confirmed:
library(git2rdata)
library(dplyr)
#>
#> Attaching package: 'dplyr'
#> The following object is masked from 'package:git2rdata':
#>
#> pull
#> The following objects are masked from 'package:stats':
#>
#> filter, lag
#> The following objects are masked from 'package:base':
#>
#> intersect, setdiff, setequal, union
library(tidyr)
set.seed(123456)
testdata <-
data.frame(group1 = as.Date("2020-01-01"):as.Date("2020-01-10"),
group2 = 1:5,
id_sub = paste0("test",1:1e3)
) %>%
mutate(group1 = as.Date(group1, origin = "1970-01-01")) %>%
as_tibble %>%
expand(group1, group2, id_sub) %>%
mutate(var1 = rnorm(5e4),
var2 = runif(5e4)) %>%
arrange(group1, group2, id_sub)
testdata %>% write_vc("split/testdata",
sorting = c("group1", "group2", "id_sub"),
split_by = c("group2"))
#> 4516a4b1c2eee604b874ca03cb0c30a32c084ae7
#> "split/testdata.tsv"
#> 76c88bc1396a725758d4e0b513fe714130a91036
#> "split/testdata.yml"
read_vc("split/testdata") %>%
arrange(group1, group2, id_sub) %>%
all.equal(testdata, check.attributes = FALSE)
#> [1] TRUE
Created on 2020-09-23 by the reprex package (v0.3.0)