Closed HerAdri closed 2 years ago
Rendering the _DT16
[, .(Species, Petal.Width, Sepal.Length, Sepal.Width)] differences in the structures of the data.frame and data.table objects are obtained
The structures achieved are different and therefore the results with the function correlation::correlation
str(firis) <> str(ciris)!!!
``r
library(correlation)
cor <- correlation(iris)
cor
summary(cor)
library(dplyr) firis<-iris %>% select(Species, Petal.Width, Sepal.Length, Sepal.Width) %>% group_by(Species) str(firis) correlation(firis)
library(data.table) irisdt<-as.data.table(iris) ciris<-irisdt[, .(Species, Petal.Width, Sepal.Length, Sepal.Width)] cor <- correlation(ciris) cor summary(cor) str(ciris)
For your original error - a lazy_dt()
is not a data.table, so you can't use data.table syntax directly on it. You need to first convert the lazy_dt
to a data.table.
library(dplyr, w = FALSE)
library(dtplyr)
library(data.table, w = FALSE)
iris_lazy <- lazy_dt(iris)
iris_lazy %>%
select(Species, Petal.Width, Sepal.Length, Sepal.Width)
#> Source: local data table [150 x 4]
#> Call: `_DT1`[, .(Species, Petal.Width, Sepal.Length, Sepal.Width)]
#>
#> Species Petal.Width Sepal.Length Sepal.Width
#> <fct> <dbl> <dbl> <dbl>
#> 1 setosa 0.2 5.1 3.5
#> 2 setosa 0.2 4.9 3
#> 3 setosa 0.2 4.7 3.2
#> 4 setosa 0.2 4.6 3.1
#> 5 setosa 0.2 5 3.6
#> 6 setosa 0.4 5.4 3.9
#> # … with 144 more rows
#>
#> # Use as.data.table()/as.data.frame()/as_tibble() to access results
iris_dt <- as.data.table(iris_lazy)
iris_dt[, .(Species, Petal.Width, Sepal.Length, Sepal.Width)]
#> Species Petal.Width Sepal.Length Sepal.Width
#> <fctr> <num> <num> <num>
#> 1: setosa 0.2 5.1 3.5
#> 2: setosa 0.2 4.9 3.0
#> 3: setosa 0.2 4.7 3.2
#> 4: setosa 0.2 4.6 3.1
#> 5: setosa 0.2 5.0 3.6
#> ---
#> 146: virginica 2.3 6.7 3.0
#> 147: virginica 1.9 6.3 2.5
#> 148: virginica 2.0 6.5 3.0
#> 149: virginica 2.3 6.2 3.4
#> 150: virginica 1.8 5.9 3.0
As for your second error - correlation::correlation()
works differently on a grouped_df
. This behavior is defined here within their own package. Therefore it won't work on a data.table "by group" correctly because that's not how the function has been defined. However if you use dtplyr you can use collect()
at the end of your pipe chain to match dplyr semantics and preserve grouping.
library(dplyr, w = FALSE)
library(dtplyr)
library(data.table, w = FALSE)
library(correlation)
iris %>%
select(Species, Petal.Width, Sepal.Length, Sepal.Width) %>%
group_by(Species) %>%
correlation()
#> # Correlation Matrix (pearson-method)
#>
#> Group | Parameter1 | Parameter2 | r | 95% CI | t(48) | p
#> -----------------------------------------------------------------------------------
#> setosa | Petal.Width | Sepal.Length | 0.28 | [ 0.00, 0.52] | 2.01 | 0.101
#> setosa | Petal.Width | Sepal.Width | 0.23 | [-0.05, 0.48] | 1.66 | 0.104
#> setosa | Sepal.Length | Sepal.Width | 0.74 | [ 0.59, 0.85] | 7.68 | < .001***
#> versicolor | Petal.Width | Sepal.Length | 0.55 | [ 0.32, 0.72] | 4.52 | < .001***
#> versicolor | Petal.Width | Sepal.Width | 0.66 | [ 0.47, 0.80] | 6.15 | < .001***
#> versicolor | Sepal.Length | Sepal.Width | 0.53 | [ 0.29, 0.70] | 4.28 | < .001***
#> virginica | Petal.Width | Sepal.Length | 0.28 | [ 0.00, 0.52] | 2.03 | 0.048*
#> virginica | Petal.Width | Sepal.Width | 0.54 | [ 0.31, 0.71] | 4.42 | < .001***
#> virginica | Sepal.Length | Sepal.Width | 0.46 | [ 0.20, 0.65] | 3.56 | 0.002**
#>
#> p-value adjustment method: Holm (1979)
#> Observations: 50
iris %>%
lazy_dt() %>%
select(Species, Petal.Width, Sepal.Length, Sepal.Width) %>%
group_by(Species) %>%
collect() %>%
correlation()
#> # Correlation Matrix (pearson-method)
#>
#> Group | Parameter1 | Parameter2 | r | 95% CI | t(48) | p
#> -----------------------------------------------------------------------------------
#> setosa | Petal.Width | Sepal.Length | 0.28 | [ 0.00, 0.52] | 2.01 | 0.101
#> setosa | Petal.Width | Sepal.Width | 0.23 | [-0.05, 0.48] | 1.66 | 0.104
#> setosa | Sepal.Length | Sepal.Width | 0.74 | [ 0.59, 0.85] | 7.68 | < .001***
#> versicolor | Petal.Width | Sepal.Length | 0.55 | [ 0.32, 0.72] | 4.52 | < .001***
#> versicolor | Petal.Width | Sepal.Width | 0.66 | [ 0.47, 0.80] | 6.15 | < .001***
#> versicolor | Sepal.Length | Sepal.Width | 0.53 | [ 0.29, 0.70] | 4.28 | < .001***
#> virginica | Petal.Width | Sepal.Length | 0.28 | [ 0.00, 0.52] | 2.03 | 0.048*
#> virginica | Petal.Width | Sepal.Width | 0.54 | [ 0.31, 0.71] | 4.42 | < .001***
#> virginica | Sepal.Length | Sepal.Width | 0.46 | [ 0.20, 0.65] | 3.56 | 0.002**
#>
#> p-value adjustment method: Holm (1979)
#> Observations: 50
I'm going to close this - but if you want correlation::correlation()
to work on grouped data.table's you'll have to open an issue in their repository.
If you have any questions let me know.
I can't get the correct translation of the code to be evaluated in the function correlation::correlation using data.table object.
R.version _
platform x86_64-w64-mingw32
arch x86_64
os mingw32
crt ucrt
system x86_64, mingw32
status
major 4
minor 2.1
year 2022
month 06
day 23
svn rev 82513
language R
version.string R version 4.2.1 (2022-06-23 ucrt) nickname Funny-Looking Kid
Brief description of the problem