tidyverse / dtplyr

Data table backend for dplyr
https://dtplyr.tidyverse.org
Other
670 stars 57 forks source link

could not find function "." #376

Closed HerAdri closed 2 years ago

HerAdri commented 2 years ago

I can't get the correct translation of the code to be evaluated in the function correlation::correlation using data.table object.

R.version _
platform x86_64-w64-mingw32
arch x86_64
os mingw32
crt ucrt
system x86_64, mingw32
status
major 4
minor 2.1
year 2022
month 06
day 23
svn rev 82513
language R
version.string R version 4.2.1 (2022-06-23 ucrt) nickname Funny-Looking Kid

Brief description of the problem

library(dplyr)
library(dtplyr)
library(data.table)
library(correlation) 

expet<-iris %>% 
  select(Species, Petal.Width, Sepal.Length, Sepal.Width) %>%
  group_by(Species) 
str(expet)
correlation(expet)

dt <- lazy_dt(iris)
str(dt)
dt %>%  select(Species, Petal.Width, Sepal.Length, Sepal.Width) %>%
  group_by(Species) %>% show_query()
#`_DT16`[, .(Species, Petal.Width, Sepal.Length, Sepal.Width)]
dt[, .(Species, Petal.Width, Sepal.Length, Sepal.Width)] %>% correlation()
**Error in .(Species, Petal.Width, Sepal.Length, Sepal.Width) : 
  could not find function "."**
HerAdri commented 2 years ago

Rendering the _DT16[, .(Species, Petal.Width, Sepal.Length, Sepal.Width)] differences in the structures of the data.frame and data.table objects are obtained The structures achieved are different and therefore the results with the function correlation::correlation str(firis) <> str(ciris)!!! ``r library(correlation) cor <- correlation(iris) cor summary(cor)

library(dplyr) firis<-iris %>% select(Species, Petal.Width, Sepal.Length, Sepal.Width) %>% group_by(Species) str(firis) correlation(firis)

library(data.table) irisdt<-as.data.table(iris) ciris<-irisdt[, .(Species, Petal.Width, Sepal.Length, Sepal.Width)] cor <- correlation(ciris) cor summary(cor) str(ciris)

markfairbanks commented 2 years ago

For your original error - a lazy_dt() is not a data.table, so you can't use data.table syntax directly on it. You need to first convert the lazy_dt to a data.table.

library(dplyr, w = FALSE)
library(dtplyr)
library(data.table, w = FALSE)

iris_lazy <- lazy_dt(iris)

iris_lazy %>% 
  select(Species, Petal.Width, Sepal.Length, Sepal.Width)
#> Source: local data table [150 x 4]
#> Call:   `_DT1`[, .(Species, Petal.Width, Sepal.Length, Sepal.Width)]
#> 
#>   Species Petal.Width Sepal.Length Sepal.Width
#>   <fct>         <dbl>        <dbl>       <dbl>
#> 1 setosa          0.2          5.1         3.5
#> 2 setosa          0.2          4.9         3  
#> 3 setosa          0.2          4.7         3.2
#> 4 setosa          0.2          4.6         3.1
#> 5 setosa          0.2          5           3.6
#> 6 setosa          0.4          5.4         3.9
#> # … with 144 more rows
#> 
#> # Use as.data.table()/as.data.frame()/as_tibble() to access results

iris_dt <- as.data.table(iris_lazy)

iris_dt[, .(Species, Petal.Width, Sepal.Length, Sepal.Width)]
#>        Species Petal.Width Sepal.Length Sepal.Width
#>         <fctr>       <num>        <num>       <num>
#>   1:    setosa         0.2          5.1         3.5
#>   2:    setosa         0.2          4.9         3.0
#>   3:    setosa         0.2          4.7         3.2
#>   4:    setosa         0.2          4.6         3.1
#>   5:    setosa         0.2          5.0         3.6
#>  ---                                               
#> 146: virginica         2.3          6.7         3.0
#> 147: virginica         1.9          6.3         2.5
#> 148: virginica         2.0          6.5         3.0
#> 149: virginica         2.3          6.2         3.4
#> 150: virginica         1.8          5.9         3.0

As for your second error - correlation::correlation() works differently on a grouped_df. This behavior is defined here within their own package. Therefore it won't work on a data.table "by group" correctly because that's not how the function has been defined. However if you use dtplyr you can use collect() at the end of your pipe chain to match dplyr semantics and preserve grouping.

library(dplyr, w = FALSE)
library(dtplyr)
library(data.table, w = FALSE)
library(correlation)

iris %>%
  select(Species, Petal.Width, Sepal.Length, Sepal.Width) %>%
  group_by(Species) %>%
  correlation()
#> # Correlation Matrix (pearson-method)
#> 
#> Group      |   Parameter1 |   Parameter2 |    r |        95% CI | t(48) |         p
#> -----------------------------------------------------------------------------------
#> setosa     |  Petal.Width | Sepal.Length | 0.28 | [ 0.00, 0.52] |  2.01 | 0.101    
#> setosa     |  Petal.Width |  Sepal.Width | 0.23 | [-0.05, 0.48] |  1.66 | 0.104    
#> setosa     | Sepal.Length |  Sepal.Width | 0.74 | [ 0.59, 0.85] |  7.68 | < .001***
#> versicolor |  Petal.Width | Sepal.Length | 0.55 | [ 0.32, 0.72] |  4.52 | < .001***
#> versicolor |  Petal.Width |  Sepal.Width | 0.66 | [ 0.47, 0.80] |  6.15 | < .001***
#> versicolor | Sepal.Length |  Sepal.Width | 0.53 | [ 0.29, 0.70] |  4.28 | < .001***
#> virginica  |  Petal.Width | Sepal.Length | 0.28 | [ 0.00, 0.52] |  2.03 | 0.048*   
#> virginica  |  Petal.Width |  Sepal.Width | 0.54 | [ 0.31, 0.71] |  4.42 | < .001***
#> virginica  | Sepal.Length |  Sepal.Width | 0.46 | [ 0.20, 0.65] |  3.56 | 0.002**  
#> 
#> p-value adjustment method: Holm (1979)
#> Observations: 50

iris %>%
  lazy_dt() %>%
  select(Species, Petal.Width, Sepal.Length, Sepal.Width) %>%
  group_by(Species) %>%
  collect() %>%
  correlation()
#> # Correlation Matrix (pearson-method)
#> 
#> Group      |   Parameter1 |   Parameter2 |    r |        95% CI | t(48) |         p
#> -----------------------------------------------------------------------------------
#> setosa     |  Petal.Width | Sepal.Length | 0.28 | [ 0.00, 0.52] |  2.01 | 0.101    
#> setosa     |  Petal.Width |  Sepal.Width | 0.23 | [-0.05, 0.48] |  1.66 | 0.104    
#> setosa     | Sepal.Length |  Sepal.Width | 0.74 | [ 0.59, 0.85] |  7.68 | < .001***
#> versicolor |  Petal.Width | Sepal.Length | 0.55 | [ 0.32, 0.72] |  4.52 | < .001***
#> versicolor |  Petal.Width |  Sepal.Width | 0.66 | [ 0.47, 0.80] |  6.15 | < .001***
#> versicolor | Sepal.Length |  Sepal.Width | 0.53 | [ 0.29, 0.70] |  4.28 | < .001***
#> virginica  |  Petal.Width | Sepal.Length | 0.28 | [ 0.00, 0.52] |  2.03 | 0.048*   
#> virginica  |  Petal.Width |  Sepal.Width | 0.54 | [ 0.31, 0.71] |  4.42 | < .001***
#> virginica  | Sepal.Length |  Sepal.Width | 0.46 | [ 0.20, 0.65] |  3.56 | 0.002**  
#> 
#> p-value adjustment method: Holm (1979)
#> Observations: 50

I'm going to close this - but if you want correlation::correlation() to work on grouped data.table's you'll have to open an issue in their repository.

If you have any questions let me know.