Closed ffinger closed 4 years ago
library(linelist) library(dplyr) library(magrittr) data(iris) iris %<>% mutate(sepal.length = Sepal.Length) %>% clean_data() glimpse(iris) #> Observations: 150 #> Variables: 6 #> $ sepal_length <dbl> 5.1, 4.9, 4.7, 4.6, 5.0, 5.4, 4.6, 5.0, 4.4, 4.9, 5… #> $ sepal_width <dbl> 3.5, 3.0, 3.2, 3.1, 3.6, 3.9, 3.4, 3.4, 2.9, 3.1, 3… #> $ petal_length <dbl> 1.4, 1.4, 1.3, 1.5, 1.4, 1.7, 1.4, 1.5, 1.4, 1.5, 1… #> $ petal_width <dbl> 0.2, 0.2, 0.2, 0.2, 0.2, 0.4, 0.3, 0.2, 0.2, 0.1, 0… #> $ species <fct> setosa, setosa, setosa, setosa, setosa, setosa, set… #> $ sepal_length <dbl> 5.1, 4.9, 4.7, 4.6, 5.0, 5.4, 4.6, 5.0, 4.4, 4.9, 5…
Duplicated names then cause problems for many functions applied to the data.frame.
I think there should at least be a warning if after cleaning some columns get the same name.
Or alternatively clean_data should detect the duplicated column names and add _1, _2 or similar to the end, in addition to the warning.
clean_data
_1
_2
Hi @ffinger, Thank you for providing a simple reproducible example and potential solution. I'll make a PR for this soon.
Duplicated names then cause problems for many functions applied to the data.frame.
I think there should at least be a warning if after cleaning some columns get the same name.
Or alternatively
clean_data
should detect the duplicated column names and add_1
,_2
or similar to the end, in addition to the warning.