ropensci / taxadb

:package: Taxonomic Database
https://docs.ropensci.org/taxadb
Other
43 stars 13 forks source link

change `top_n` by `filter` and `max` #81

Closed kguidonimartins closed 3 years ago

kguidonimartins commented 3 years ago

As noted in #80, dplyr::top_n fails to sort rows within a database connection. Perhaps this small change can solve this problem.

Tested with:

if (!require("DBI")) install.packages("DBI")
#> Loading required package: DBI
if (!require("RSQLite")) install.packages("RSQLite")
#> Loading required package: RSQLite
if (!require("dplyr")) install.packages("dplyr")
#> Loading required package: dplyr
#> 
#> Attaching package: 'dplyr'
#> The following objects are masked from 'package:stats':
#> 
#>     filter, lag
#> The following objects are masked from 'package:base':
#> 
#>     intersect, setdiff, setequal, union

df <-
  data.frame(
    scentificName = c("bob", "bob", "alice"),
    rank = c("subfamily", "family", "genus"),
    sort = c(1, 1, 2), stringsAsFactors = FALSE
  ) %>%
  mutate(row_num = row_number())

df_top <-
  df %>%
  group_by(sort) %>%
  top_n(1, row_num)

df_top
#> # A tibble: 2 x 4
#> # Groups:   sort [2]
#>   scentificName rank    sort row_num
#>   <chr>         <chr>  <dbl>   <int>
#> 1 bob           family     1       2
#> 2 alice         genus      2       3

df_max <-
  df %>%
  group_by(sort) %>%
  filter(row_num == max(row_num))

df_max
#> # A tibble: 2 x 4
#> # Groups:   sort [2]
#>   scentificName rank    sort row_num
#>   <chr>         <chr>  <dbl>   <int>
#> 1 bob           family     1       2
#> 2 alice         genus      2       3

all_equal(df_top, df_max)
#> [1] TRUE

conn <-
  DBI::dbConnect(
    drv = RSQLite::SQLite(),
    ":memory:"
  )

df_conn <-
  df %>%
  copy_to(
    dest = conn,
    name = "test-changes"
  )

df_conn %>%
  group_by(sort) %>%
  top_n(1, row_num)
#> Error: no such function: top_n_rank

df_conn %>%
  group_by(sort) %>%
  filter(row_num == max(row_num, na.rm = TRUE))
#> # Source:   lazy query [?? x 4]
#> # Database: sqlite 3.33.0 [:memory:]
#> # Groups:   sort
#>   scentificName rank    sort row_num
#>   <chr>         <chr>  <dbl>   <int>
#> 1 bob           family     1       2
#> 2 alice         genus      2       3

Created on 2020-11-26 by the reprex package (v0.3.0)

Session info ``` r devtools::session_info() #> ─ Session info ─────────────────────────────────────────────────────────────── #> setting value #> version R version 4.0.3 (2020-10-10) #> os Arch Linux #> system x86_64, linux-gnu #> ui X11 #> language (EN) #> collate en_US.UTF-8 #> ctype pt_BR.UTF-8 #> tz America/Fortaleza #> date 2020-11-26 #> #> ─ Packages ─────────────────────────────────────────────────────────────────── #> package * version date lib source #> assertthat 0.2.1 2019-03-21 [1] CRAN (R 4.0.0) #> bit 4.0.4 2020-08-04 [1] CRAN (R 4.0.2) #> bit64 4.0.5 2020-08-30 [1] CRAN (R 4.0.2) #> blob 1.2.1 2020-01-20 [1] CRAN (R 4.0.0) #> callr 3.5.1 2020-10-13 [1] CRAN (R 4.0.3) #> cli 2.2.0 2020-11-20 [1] CRAN (R 4.0.3) #> crayon 1.3.4 2017-09-16 [1] CRAN (R 4.0.0) #> DBI * 1.1.0 2019-12-15 [1] CRAN (R 4.0.0) #> dbplyr 2.0.0 2020-11-03 [1] CRAN (R 4.0.3) #> desc 1.2.0 2018-05-01 [1] CRAN (R 4.0.0) #> devtools 2.3.2 2020-09-18 [1] CRAN (R 4.0.3) #> digest 0.6.27 2020-10-24 [1] CRAN (R 4.0.3) #> dplyr * 1.0.2 2020-08-18 [1] CRAN (R 4.0.3) #> ellipsis 0.3.1 2020-05-15 [1] CRAN (R 4.0.3) #> evaluate 0.14 2019-05-28 [1] CRAN (R 4.0.0) #> fansi 0.4.1 2020-01-08 [1] CRAN (R 4.0.0) #> fs 1.5.0 2020-07-31 [1] CRAN (R 4.0.3) #> generics 0.1.0 2020-10-31 [1] CRAN (R 4.0.3) #> glue 1.4.2 2020-08-27 [1] CRAN (R 4.0.3) #> highr 0.8 2019-03-20 [1] CRAN (R 4.0.0) #> htmltools 0.5.0 2020-06-16 [1] CRAN (R 4.0.1) #> knitr 1.30 2020-09-22 [1] CRAN (R 4.0.3) #> lifecycle 0.2.0 2020-03-06 [1] CRAN (R 4.0.0) #> magrittr 2.0.1 2020-11-17 [1] CRAN (R 4.0.3) #> memoise 1.1.0 2017-04-21 [1] CRAN (R 4.0.0) #> pillar 1.4.7 2020-11-20 [1] CRAN (R 4.0.3) #> pkgbuild 1.1.0 2020-07-13 [1] CRAN (R 4.0.3) #> pkgconfig 2.0.3 2019-09-22 [1] CRAN (R 4.0.0) #> pkgload 1.1.0 2020-05-29 [1] CRAN (R 4.0.0) #> prettyunits 1.1.1 2020-01-24 [1] CRAN (R 4.0.0) #> processx 3.4.4 2020-09-03 [1] CRAN (R 4.0.3) #> ps 1.4.0 2020-10-07 [1] CRAN (R 4.0.3) #> purrr 0.3.4 2020-04-17 [1] CRAN (R 4.0.3) #> R6 2.5.0 2020-10-28 [1] CRAN (R 4.0.3) #> Rcpp 1.0.5 2020-07-06 [1] CRAN (R 4.0.3) #> remotes 2.2.0 2020-07-21 [1] CRAN (R 4.0.3) #> rlang 0.4.8 2020-10-08 [1] CRAN (R 4.0.3) #> rmarkdown 2.5 2020-10-21 [1] CRAN (R 4.0.3) #> rprojroot 2.0.2 2020-11-15 [1] CRAN (R 4.0.3) #> RSQLite * 2.2.1 2020-09-30 [1] CRAN (R 4.0.2) #> sessioninfo 1.1.1 2018-11-05 [1] CRAN (R 4.0.0) #> stringi 1.5.3 2020-09-09 [1] CRAN (R 4.0.3) #> stringr 1.4.0 2019-02-10 [1] CRAN (R 4.0.0) #> testthat 3.0.0 2020-10-31 [1] CRAN (R 4.0.3) #> tibble 3.0.4 2020-10-12 [1] CRAN (R 4.0.3) #> tidyselect 1.1.0 2020-05-11 [1] CRAN (R 4.0.0) #> usethis 1.6.3 2020-09-17 [1] CRAN (R 4.0.3) #> utf8 1.1.4 2018-05-24 [1] CRAN (R 4.0.0) #> vctrs 0.3.5 2020-11-17 [1] CRAN (R 4.0.3) #> withr 2.3.0 2020-09-22 [1] CRAN (R 4.0.3) #> xfun 0.19 2020-10-30 [1] CRAN (R 4.0.3) #> yaml 2.2.1 2020-02-01 [1] CRAN (R 4.0.0) #> #> [1] /home/karlo/.local/lib/R/library/4.0 #> [2] /usr/lib/R/library ```
cboettig commented 3 years ago

closes #80

cboettig commented 3 years ago

Very nice, thanks for this!