ropensci / dittodb

dittodb: A Test Environment for DB Queries in R
https://dittodb.jonkeane.com/
Other
81 stars 15 forks source link

dbFetch `n` is ignored while mocking databases. #101

Open ndiquattro opened 4 years ago

ndiquattro commented 4 years ago

Hello! Firstly, thanks for working on this package. I am very excited about it!

I'm getting a warning from testthat when my query involves dplyr and head() to limit the size of my query. Is there a workaround? Given the need to call collect() in the test, this could result in a large SELECT-* file.

Test code:

setup({
  dittodb::start_mock_db()
})

teardown({
  dittodb::stop_mock_db()
})

con <- DBI::dbConnect(odbc::odbc(), "athena")

test_that("A table is returned correctly", {
  apps <-
    dplyr::tbl(con, dbplyr::in_schema(schema, table)) %>%
    head(10) %>%
    dplyr::collect()

  expect_equal(dim(apps), c(10, 10))
  expect_equal(names(apps), tolower(names(apps)))
})

Warning:

av_get_table.R:40: warning: A table is returned correctly
dbFetch `n` is ignored while mocking databases.

Thanks again!

jonkeane commented 4 years ago

Hello, thanks for the detailed report. I've not used dittodb with athena before, so I'm glad that it's (at least somewhat!) working.

First and most simply, that warning is something that you can probably ignore / suppress in this case if the test runs otherwise, and your expectations can all run on the ten rows. What's going on internally is that dittodb does not (yet) have a mechanism for saving/returning set of results specified with n in DBI::dbFetch.

Side note: I tried to recreate the warning using odbc with a postgres backend and didn't get the same warning. I suspect that there's a slight difference between how the athena odbc driver and how the postgres odbc here that is resulting in the athena driver using the n argument of dbFetch() but not the postgres driver. But the pattern/process below should work for the athena driver just as well.

You can also setup your tests to not use head() at all. Like you note, you wouldn't want to save a full table collect() as a fixture, for now the process I recommend is: record the fixture with head() (or any other limit on the query like LIMIT = 10 in the SQL if one isn't using dbplyr), then when you are writing your tests don't use head() and you'll need to change the hash on the fixture file since the query would (probably) be slightly different. Below is what I wrote to test that this would work (again with postgres+odbc). I've changed a few things around slightly (using the flights data dittodb uses for testing, and using the with_mock_db({}) wrapper, but the gist of the code is here (please let me know if I've missed something in this translation though!).

library(dittodb)
library(DBI)
library(dplyr)
library(testthat)

# recording the limited fixture
start_db_capturing(".")
con <- dbConnect(
  odbc::odbc(),
  Driver = "PostgreSQL Unicode",
  Server = "127.0.0.1",
  Database = "nycflights"
)

flights <-
  dplyr::tbl(con, dbplyr::in_schema("odbc", "flights")) %>%
  head(10) %>%
  dplyr::collect()

dbDisconnect(con)
stop_db_capturing()

And then for your test use the same code, but minus the head()

with_mock_db({
  con <- dbConnect(
    odbc::odbc(),
    Driver = "PostgreSQL Unicode",
    Server = "127.0.0.1",
    Database = "nycflights"
  )

  test_that("A table is returned correctly", {
    flights <-
      dplyr::tbl(con, dbplyr::in_schema("odbc", "flights")) %>%
      dplyr::collect()

    expect_equal(dim(flights), c(10, 19))
    expect_equal(names(flights), tolower(names(flights)))
  })
})

When you run this chunk above the first time you might get an error like. This is because the query has changed so the hash for the fixture has also changed. You can simply rename the fixture above (here it was called SELECT-dd9de2.R before to SELECT-dd9de2.R).

# Get error:
# Error: Test failed: 'A table is returned correctly'
# * Couldn't find the file nycflights/SELECT-dd9de2.R in any of the mock directories.

It's actually possible with athena that the query doesn't change if it's using dbFetch(n=) for limiting rows, in which case you shouldn't need to rename the fixture at all!

I know that this process above is a bit clunky, and something I am thinking about ways to improve. Let me know if this doesn't work for you and I can dig in a little bit more into the specifics of athena here.

ndiquattro commented 4 years ago

Thanks for the response! I tried the remove head()/change hash method and was still getting the same warning. I ended up going with just suppressing the warnings for now.

Up to you if you want to leave the issue open or not. I'd be interested in knowing when a built-in solution is available, FWIW. Thanks again!

jonkeane commented 4 years ago

I'll keep it open as a reminder to myself to try against an Athena backend myself and see if I can find why that warning is still popping up. Thanks again for the report!