radiant-rstats / radiant

Business analytics using R and Shiny. The radiant app combines the menus from radiant.data, radiant.design, radiant.basics, radiant.model, and radiant.multivariate.
https://radiant-rstats.github.io/docs/
Other
458 stars 134 forks source link

Inconsistent detection of variable type after import, maybe?? #34

Closed justinidp closed 7 years ago

justinidp commented 7 years ago

I'm a former student from Rady trying to use Radiant in my internship. I've been importing datasets from CSV files and the detection of what is a factor and what is character string seems to be inconsistent, but perhaps if I understood the logic decision for this, I would be able to anticipate the detection that is made. Could you pass along the logic that Radiant uses to detect whether a variable type is a factor or character string? And if there is anyway to adjust that decision, please lmk. I am importing a lot of survey datasets that have questions with 5 answers (i.e. Not bitter, Slightly bitter, Just about right, Slightly too bitter, Too bitter). Sometimes all 5 answers are found in a survey, sometimes it is 4. There are other questions with up to 6 answers, which I would also like Radiant to detect as a factor without me doing a "Change Type" transformation. Please lmk if this is possible, or at least what I would need to be aware of for how this detection is made. What I would probably like to do if possible is to force Radiant to detect a variable as a factor is there are less than 10 "levels" (sorry if I used this term wrong). Thank you, Justin

vnijs commented 7 years ago

If you don't want to convert any character variables to factors un-check Str. as Factor on the Data > Manage tab for CSV files

In case you didn't know, you specify multiple variables to "Change type" in one go (see screen shot below).

image

Radiant looks at the number of levels relative to the number of observations to determine if a character variable should be factor, assuming Str. as Factor is checked. If you want more control you can add the following to R > Report but replace citibike_small by the name of your dataset

```{r}
r_data[["citibike_small"]] <- mutate_if(r_data[["citibike_small"]], function(x) is.character(x) && 
    length(unique(x)) < 10, .funs = funs(as.factor))
```

Hope that help

justinidp commented 7 years ago

Thanks you Prof. Nijs. I think it will end up best for us for me to include in the SOP a step to select all the variables that should be factors, and then use the convert function. I realized there could be some issues if I tried to force it to be a factor with less than 10 unique answers. We will have some survey questions that are not answered very often (i.e. requesting an email address for follow-up), and I'd rather Radiant be able to detect this as a character string even if we have less than 10 email addresses. Thank you, Justin

vnijs commented 7 years ago

OK Justin. Closing this issue for now