Closed mrdwab closed 9 years ago
I guess a related feature would be to specify some variables that would be grouped in a balanced way. For instance, ID and state might not be grouped, but "age" and "sex" might be with two measurements each.
Perhaps there is better syntax, but I'm imagining something like:
r_data_frame(3, id, state, Grouped(2, age, sex))
# ID State Age_1 Age_2 Sex_1 Sex_2
# 1 1 Pennsylvania 21 23 Female Male
# 2 2 South Carolina 29 26 Female Male
# 3 3 Florida 30 20 Male Male
Sorry--no psuedo code to achieve this yet :-)
@mrdwab Thanks for the feedback. That gets my brain flowing a bit too. Maybe a repeated measures function that takes a function and the number of times to repeat it and names accordingly. Something like...
Never mind...
As I read on I see that's what you proposed with Grouped
:+1:
@mrdwab I added made the first switch to better named columns using your fix. It was simple. Thanks for the suggestion.
I also r_series
(like your Grouped
but forces one function rather than several) + r_dummy
(inspired by r_series
) .
The next step is to get this working within r_list
and r_data_frame
so they recognize the data.frame out puts and act accordingly. I think it should be fairly straight forward but am out of time for the day.
Though it already is pretty close for r_list
, probably just name the element with the race vector as "Race".
r_list(n=5,
r_series(race, 4),
age
)
$X1
Source: local data frame [5 x 4]
Race_1 Race_2 Race_3 Race_4
1 White White White White
2 White White White White
3 White White White Hispanic
4 Black Hispanic White Hispanic
5 Hispanic White White White
$Age
[1] 32 33 34 26 30
@mrdwab Again thanks for the suggestions. I have added these features, which can be seen demoed in the README.
I've added you as a contributor on the package as well. Great suggestions.
Here's a quickie demo:
if (!require("pacman")) install.packages("pacman"); library(pacman)
p_install_gh("trinker/wakefield"); p_load("wakefield")
r_data_frame(
n = 5,
id,
race, race, race,
age, age, age
)
## Source: local data frame [5 x 7]
##
## ID Race_1 Race_2 Race_3 Age_1 Age_2 Age_3
## 1 1 White Hispanic Hispanic 25 21 31
## 2 2 White White White 20 35 20
## 3 3 White White White 21 26 33
## 4 4 Black Hispanic White 34 33 33
## 5 5 Black White White 21 28 28
r_data_frame(3,
id,
state,
r_series(likert, 4, integer = TRUE),
r_series(age, 2),
r_dummy(sex)
)
## Source: local data frame [3 x 10]
##
## ID State Likert_1 Likert_2 Likert_3 Likert_4 Age_1 Age_2 Male Female
## 1 1 Indiana 1 5 1 1 24 33 1 0
## 2 2 Iowa 3 1 2 5 26 29 1 0
## 3 3 Florida 3 3 2 3 21 30 0 1
Lookin' good :-) :+1:
It would be nice for those of us who are lazy to have convenient names for repeated measures in a wide format.
Consider:
Generally, the preferred form would be to have all "times" identified. Thus, at the very minimum,
Race
should becomeRace.0
for balance in the naming scheme.I know I can just do:
But that's a lot of extra typing :-(
I haven't dug into your code (hence raising an issue and not a pull request), but it's possible that the fix might be something as easy as: