Closed tungttnguyen closed 2 years ago
Smaller reprex:
pacman::p_load(tidytable)
df <- tidytable(x = "/a/b/")
df %>%
separate(x, into = c("dummy1", "a", "b", "dummy2"), sep = "/")
#> Error in data.table::tstrsplit(x, split = "/", fixed = TRUE, keep = 1:4, : 'keep' should contain integer values between 1 and 3.
This basically occurs because of the differences between base::strsplit()
(which data.table
utilizes) and stringr::str_split()
(which tidyr
utilizes). base::strsplit()
ignores the empty split at the end:
chr <- "/a/b/"
strsplit(chr, split = "/")
#> [[1]]
#> [1] "" "a" "b"
stringr::str_split(chr, pattern = "/")
#> [[1]]
#> [1] "" "a" "b" ""
For now the workaround is to drop "dummy2"
from into =
since the last "empty" column isn't created when using data.table
:
pacman::p_load(tidytable)
df1 <- data.frame(
stringsAsFactors = FALSE,
Path = c("/MODEL/WAT/VAL//1MON/VOL/",
"/MODEL/WAT/VAL//1MON/VOL/",
"/MODEL/WAT/VAL//1MON/VOL/"),
Index = c("1999-10-01 16:00:00","1999-11-01 16:00:00",
"1999-12-01 16:00:00"),
Value = c(3.94, 2.14, 1.39)
)
df1 %>%
separate(Path, into = c("dummy1",
"A", "B", "C", "D", "E", "F"),
sep = "/")
#> # A tidytable: 3 × 9
#> Index Value dummy1 A B C D E F
#> <chr> <dbl> <chr> <chr> <chr> <chr> <chr> <chr> <chr>
#> 1 1999-10-01 16:00:00 3.94 "" MODEL WAT VAL "" 1MON VOL
#> 2 1999-11-01 16:00:00 2.14 "" MODEL WAT VAL "" 1MON VOL
#> 3 1999-12-01 16:00:00 1.39 "" MODEL WAT VAL "" 1MON VOL
Thank you!
@tungttnguyen - I've decided I'm going to just leave this one as-is. It seems like an edge case, and fixing it will have too big of a cost on performance.
Thanks for reporting either way 😄
As I think about this some more - it won't have tidyr behavior exactly in your case, but I can build it so that it doesn't error when too many (or too few) columns are provided to into =
. This is something that works in tidyr::separate()
.
pacman::p_load(tidyr)
df <- tibble(x = c("a_a", "b_b", "c_c"))
# Too many
df %>%
separate(x, c("one", "two", "three"), sep = "_")
#> Warning: Expected 3 pieces. Missing pieces filled with `NA` in 3 rows [1, 2, 3].
#> # A tibble: 3 × 3
#> one two three
#> <chr> <chr> <chr>
#> 1 a a <NA>
#> 2 b b <NA>
#> 3 c c <NA>
# Too few
df %>%
separate(x, "one", sep = "_")
#> Warning: Expected 1 pieces. Additional pieces discarded in 3 rows [1, 2, 3].
#> # A tibble: 3 × 1
#> one
#> <chr>
#> 1 a
#> 2 b
#> 3 c
All set.
pacman::p_load(tidytable)
df1 <- data.frame(
stringsAsFactors = FALSE,
Path = c("/MODEL/WAT/VAL//1MON/VOL/",
"/MODEL/WAT/VAL//1MON/VOL/",
"/MODEL/WAT/VAL//1MON/VOL/"),
Index = c("1999-10-01 16:00:00","1999-11-01 16:00:00",
"1999-12-01 16:00:00"),
Value = c(3.94, 2.14, 1.39)
)
df1 %>%
separate(Path, into = c("dummy1",
"A", "B", "C", "D", "E", "F",
"dummy2"),
sep = "/")
#> # A tidytable: 3 × 10
#> Index Value dummy1 A B C D E F dummy2
#> <chr> <dbl> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr>
#> 1 1999-10-01 16:00:00 3.94 "" MODEL WAT VAL "" 1MON VOL <NA>
#> 2 1999-11-01 16:00:00 2.14 "" MODEL WAT VAL "" 1MON VOL <NA>
#> 3 1999-12-01 16:00:00 1.39 "" MODEL WAT VAL "" 1MON VOL <NA>
Hi Mark,
In the example below
tidytable::separate()
threw error whiletidyr::separate()
did not. Can you check what went wrong? Thank you!Created on 2022-10-21 with reprex v2.0.2