In a dataframe where two column names are passed to group_by, the separate function will not find the first column name (but will find the second one).
And so if we attempt to pipe together a few functions, and then separate the contents of the first column in the group_by, which in this example is the_chr1, here's what happens:
The result is an unexpected Error: unknown column 'the_chr1'
However, if we try to separate the second column in the group_by (here it's the_chr2), it works fine:
df %>%
group_by(the_chr1, the_chr2) %>%
summarize(mean_i = mean(the_num)) %>%
separate(the_chr2, c('first_bit', 'second_bit'), sep = 1)
Source: local data frame [30 x 4]
Groups: the_chr1 [10]
the_chr1 first_bit second_bit mean_i
(fctr) (chr) (chr) (dbl)
1 Aeq h uX 15
2 Aeq R rd 5
3 Aeq W GJ 25
4 Fiq F OU 11
5 Fiq L MH 1
6 Fiq y IE 21
7 FlV G da 19
8 FlV i pU 29
9 FlV l Yn 9
10 hPy A MN 7
.. ... ... ... ...
Of course it works file if we group_by and separate on the same one column:
df %>%
group_by(the_chr2) %>%
summarize(mean_i = mean(the_num)) %>%
separate(the_chr2, c('first_bit', 'second_bit'), sep = 1)
Source: local data frame [30 x 3]
first_bit second_bit mean_i
(chr) (chr) (dbl)
1 A MN 7
2 C ur 18
3 e rc 24
4 F OU 11
5 G da 19
6 h dv 2
7 H pE 4
8 h uX 15
9 h Wv 28
10 I JP 27
.. ... ... ...
So there seems to be a bit of a problem with separate handling data frames with multiple grouping variables.
In a dataframe where two column names are passed to
group_by
, theseparate
function will not find the first column name (but will find the second one).Here's an example...
And so if we attempt to pipe together a few functions, and then
separate
the contents of the first column in thegroup_by
, which in this example isthe_chr1
, here's what happens:The result is an unexpected
Error: unknown column 'the_chr1'
However, if we try to separate the second column in the
group_by
(here it'sthe_chr2
), it works fine:Of course it works file if we
group_by
andseparate
on the same one column:So there seems to be a bit of a problem with
separate
handling data frames with multiple grouping variables.