traitecoevo / plant

Trait-Driven Models of Ecology and Evolution :evergreen_tree:
https://traitecoevo.github.io/plant
53 stars 20 forks source link

Tests failing on Appveyor #184

Closed dfalster closed 6 years ago

dfalster commented 6 years ago

Recent builds show tests failing on Appveyor (windows machine) build/1.0.19#L538. The package installs fine but

Running the tests in 'tests/testthat.R' failed.
Last 13 lines of output:

  testthat results ================================================================
  OK: 1303 SKIPPED: 1 FAILED: 9
  1. Failure: Ported from tree1 (@test-cohort.R#60) 
  2. Failure: ODE interface (@test-cohort.R#120) 
  3. Failure: Gradients agree (@test-gradient.R#34) 
  4. Failure: Sensible behaviour on integration failure (@test-plant-runner.R#172) 
  5. Failure: Sensible behaviour on integration failure (@test-plant-runner.R#177) 
  6. Failure: Reference comparison (@test-plant.R#81) 
  7. Failure: Vectorised interface to integration works (@test-qk.R#59) 
  8. Failure: Vectorised interface to integration works (@test-qk.R#67) 
  9. Failure: Vectorised interface to integration works (@test-qk.R#71) 

  Error: testthat unit tests failed
  Execution halted
* DONE

I need a windows machine to sort this out!

dfalster commented 6 years ago

Tests are running fine on the first windows machine I've tried:

==> devtools::test()

Loading plant
Loading required package: testthat
Testing plant
Adaptive interpolator: ......
Build_schedule: ......
CohortSchedule: ....................................................................................
Cohort-FF16: .......................
Cohort-FF16r: .......................
Control: ...
Disturbance: ............
Environment-FF16: .................
Environment-FF16r: .................
fitness support: .....S
Gradient: .......
Interpolator: .........
Lorenz (basic ODE): .....................................
Modular: ..................................................................
OdeControl: ...
Parameters: ...............................................................................................
Patch-FF16: ....................
Patch-FF16r: ....................
PlantPlus: .........................................
PlantRunner: ..................................................
Plant utilities: ..............................
Plant-FF16: .................................................
Plant-FF16r: .................................................
QAG: ......................................................................
QK: ............................
SCM support: ..................
SCM: ............................................................................................................
Species-FF16: ...........................................................
Species-FF16r: ...........................................................
StochasticPatchRunner: ..........................
StochasticPatch: .........................................
StochasticSpecies: .....................................................................................................................
Strategy-FF16: ...............................................................
Strategy-FF16r: ...........
Support: ...........
Tools: .....
Trapezium integration: ........
Uniroot: ...
utils: .......

Skipped ------------------------------------------------------------------------
1. positive_2d (@test-fitness-support.R#19) - plant.ml not installed

DONE ===========================================================================
> sessionInfo()
R version 3.3.2 (2016-10-31)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows 7 x64 (build 7601) Service Pack 1

locale:
[1] LC_COLLATE=English_Australia.1252  LC_CTYPE=English_Australia.1252    LC_MONETARY=English_Australia.1252
[4] LC_NUMERIC=C                       LC_TIME=English_Australia.1252    

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
[1] plant_1.0.0    testthat_1.0.2

loaded via a namespace (and not attached):
 [1] Rcpp_0.12.13      roxygen2_6.0.1    digest_0.6.12     crayon_1.3.4      withr_1.0.2       commonmark_1.4   
 [7] BH_1.65.0-1       R6_2.2.2          magrittr_1.5      stringi_1.1.2     rstudioapi_0.6    xml2_1.1.1       
[13] devtools_1.13.2   tools_3.3.2       stringr_1.2.0     numDeriv_2016.8-1 loggr_0.3         yaml_2.1.14      
[19] memoise_1.1.0 
RemkoDuursma commented 6 years ago

Runs fine on my Win7 machine, but I do get this:

Skipped ------------------------------------------------------------------------
1. positive_2d (@test-fitness-support.R#19) - plant.ml not installed

I don't know what any of those Appveyor error messages mean tbh, but I suspect some system requirements are missing on their server?

dfalster commented 6 years ago

Thanks @RemkoDuursma ! (The skipped message is fine, that's a test that only runs when an optional package is installed.)

dfalster commented 6 years ago

Ok, managed to find more explicit indications of why tests are failing in the logs ([here]()):

[00:04:28] > library(testthat)
[00:04:28] > library(plant)
[00:04:28] > 
[00:04:28] > test_check("plant")
[00:04:28] 1. Failure: Ported from tree1 (@test-cohort.R#60) ------------------------------
[00:04:28] `dgdh` not identical to `dgdh_forward`.
[00:04:28] Objects equal but not identical
[00:04:28] 
[00:04:28] 
[00:04:28] 2. Failure: ODE interface (@test-cohort.R#120) ---------------------------------
[00:04:28] cohort$plant$internals[v] not identical to plant$internals[v].
[00:04:28] Objects equal but not identical
[00:04:28] 
[00:04:28] 
[00:04:28] 3. Failure: Gradients agree (@test-gradient.R#34) ------------------------------
[00:04:28] test_gradient_richardson(f, x, d, r) not identical to numDeriv::grad(f, x, method.args = method_args).
[00:04:28] Objects equal but not identical
[00:04:28] 
[00:04:28] 
[00:04:28] 4. Failure: Sensible behaviour on integration failure (@test-plant-runner.R#172) 
[00:04:28] `warnings` does not match "integration failed with error".
[00:04:28] Actual value: "Time exceeded time_max, 50 larger sizes dropped"
[00:04:28] 
[00:04:28] 
[00:04:28] 5. Failure: Sensible behaviour on integration failure (@test-plant-runner.R#177) 
[00:04:28] nrow(res$trajectory) not equal to 1.
[00:04:28] 1/1 mismatches
[00:04:28] [1] 111 - 1 == 110
[00:04:28] 
[00:04:28] 
[00:04:28] 6. Failure: Reference comparison (@test-plant.R#81) ----------------------------
[00:04:28] vars_pl[rate_names] not identical to vars_pp[rate_names].
[00:04:28] Objects equal but not identical
[00:04:28] 
[00:04:28] 
[00:04:28] 7. Failure: Vectorised interface to integration works (@test-qk.R#59) ----------
[00:04:28] int_21$integrate_vector(f(x), a, b) not identical to int_21$integrate(f, a, b).
[00:04:28] Objects equal but not identical
[00:04:28] 
[00:04:28] 
[00:04:28] 8. Failure: Vectorised interface to integration works (@test-qk.R#67) ----------
[00:04:28] int_41$integrate_vector(f(x), a, b) not identical to int_41$integrate(f, a, b).
[00:04:28] Objects equal but not identical
[00:04:28] 
[00:04:28] 
[00:04:28] 9. Failure: Vectorised interface to integration works (@test-qk.R#71) ----------
[00:04:28] int_51$integrate_vector(f(x), a, b) not identical to int_51$integrate(f, a, b).
[00:04:28] Objects equal but not identical
[00:04:28] 
[00:04:28] 
[00:04:28] testthat results ================================================================
[00:04:28] OK: 1303 SKIPPED: 1 FAILED: 9
[00:04:28] 1. Failure: Ported from tree1 (@test-cohort.R#60) 
[00:04:28] 2. Failure: ODE interface (@test-cohort.R#120) 
[00:04:28] 3. Failure: Gradients agree (@test-gradient.R#34) 
[00:04:28] 4. Failure: Sensible behaviour on integration failure (@test-plant-runner.R#172) 
[00:04:28] 5. Failure: Sensible behaviour on integration failure (@test-plant-runner.R#177) 
[00:04:28] 6. Failure: Reference comparison (@test-plant.R#81) 
[00:04:28] 7. Failure: Vectorised interface to integration works (@test-qk.R#59) 
[00:04:28] 8. Failure: Vectorised interface to integration works (@test-qk.R#67) 
[00:04:28] 9. Failure: Vectorised interface to integration works (@test-qk.R#71) 
[00:04:28] 
[00:04:28] Error: testthat unit tests failed
[00:04:28] Execution halted
dfalster commented 6 years ago

7 of the 9 tests fail when running expect_identical with message "Objects equal but not identical". So seems these are minor numerical differences. We should perhaps be using expect_equal instead of expect_identical. Using the example from testthat:

> library(testthat)
>      expect_identical(sqrt(2) ^ 2, 2)
Error: sqrt(2)^2 not identical to 2.
Objects equal but not identical
>      expect_equal(sqrt(2) ^ 2, 2)

Before changing anything, a quick check to see how odten we are using expect_identical and expect_equal:

  plant git:(master) ✗ grep -r expect_equal  tests/testthat/* | wc -l
     406
➜  plant git:(master) ✗ grep -r expect_identical  tests/testthat/* | wc -l
     232

And some examples of each

➜  plant git:(master) ✗ grep -r expect_equal  tests/testthat/* |  head -n 30
tests/testthat/test-adaptive-interpolator.R:  expect_equal(s$size, 241)
tests/testthat/test-adaptive-interpolator.R:  expect_equal(nrow(s$xy), s$size)
tests/testthat/test-adaptive-interpolator.R:  expect_equal(zz_mid, yy_mid, tolerance=2e-8)
tests/testthat/test-build-schedule.R:    expect_equal(length(p$cohort_schedule_times_default), 141)
tests/testthat/test-build-schedule.R:    expect_equal(length(p$cohort_schedule_times[[1]]), 176)
tests/testthat/test-cohort-schedule.R:  expect_equal(sched$size, 0)
tests/testthat/test-cohort-schedule.R:  expect_equal(sched$n_species, n_species)
tests/testthat/test-cohort-schedule.R:  expect_equal(sched$remaining, 0)
tests/testthat/test-cohort-schedule.R:  expect_equal(sched$max_time, Inf)
tests/testthat/test-cohort-schedule.R:  expect_equal(sched$ode_times, numeric(0))
tests/testthat/test-cohort-schedule.R:  expect_equal(sched$size, length(t1))
tests/testthat/test-cohort-schedule.R:  expect_equal(sched$remaining, length(t1))
tests/testthat/test-cohort-schedule.R:  expect_equal(sched$times(species_index), t1)
tests/testthat/test-cohort-schedule.R:  expect_equal(e$species_index, species_index)
tests/testthat/test-cohort-schedule.R:  expect_equal(sched$remaining, 0)
tests/testthat/test-cohort-schedule.R:  expect_equal(cmp[,1], rep(1, n))
tests/testthat/test-cohort-schedule.R:  expect_equal(cmp[,2], t1)
tests/testthat/test-cohort-schedule.R:  expect_equal(cmp[,3], t1)
tests/testthat/test-cohort-schedule.R:  expect_equal(cmp[,4], c(t1[-1], Inf))
tests/testthat/test-cohort-schedule.R:  expect_equal(cmp[,5], c(t1[-1], Inf))
tests/testthat/test-cohort-schedule.R:  expect_equal(sched$times(1), t1)
tests/testthat/test-cohort-schedule.R:  expect_equal(sched$times(2), t2)
tests/testthat/test-cohort-schedule.R:  expect_equal(sched$size, length(t1) + length(t2))
tests/testthat/test-cohort-schedule.R:  expect_equal(cmp[,1], expected$species_index)
tests/testthat/test-cohort-schedule.R:  expect_equal(cmp[,2], expected$start)
tests/testthat/test-cohort-schedule.R:  expect_equal(cmp[,3], expected$start)
tests/testthat/test-cohort-schedule.R:  expect_equal(cmp[,4], expected$end)
tests/testthat/test-cohort-schedule.R:  expect_equal(cmp[,5], expected$end)
tests/testthat/test-cohort-schedule.R:  expect_equal(sched$max_time, max_t)
tests/testthat/test-cohort-schedule.R:  expect_equal(sched$times(1), t1_new)

➜  plant git:(master) ✗ grep -r expect_identical  tests/testthat/* |  head -n 30
tests/testthat/test-adaptive-interpolator.R:  expect_identical(s$xy[,2], target(xx_eval))
tests/testthat/test-cohort-schedule.R:  expect_identical(e$species_index, 1L)
tests/testthat/test-cohort-schedule.R:  expect_identical(e$species_index_raw, 0.0)
tests/testthat/test-cohort-schedule.R:  expect_identical(e$species_index, 2L)
tests/testthat/test-cohort-schedule.R:  expect_identical(e$species_index_raw, 1.0)
tests/testthat/test-cohort-schedule.R:  expect_identical(e$times, pi)
tests/testthat/test-cohort-schedule.R:  expect_identical(e$time_introduction, pi)
tests/testthat/test-cohort-schedule.R:  expect_identical(e$time_end, pi)
tests/testthat/test-cohort-schedule.R:  expect_identical(e$time_introduction, t1[[1]])
tests/testthat/test-cohort-schedule.R:  expect_identical(sched$next_event$time_introduction, min(c(t1, t2)))
tests/testthat/test-cohort-schedule.R:  expect_identical(sched$all_times, t_new)
tests/testthat/test-cohort-schedule.R:  expect_identical(sched$ode_times, numeric(0))
tests/testthat/test-cohort-schedule.R:  expect_identical(sched$ode_times, t_ode)
tests/testthat/test-cohort-schedule.R:  expect_identical(sched$ode_times, t_ode)
tests/testthat/test-cohort-schedule.R:  expect_identical(sched$max_time, max(t_ode))
tests/testthat/test-cohort-schedule.R:  expect_identical(sched$max_time, max_t)
tests/testthat/test-cohort-schedule.R:  expect_identical(sched$times(1), times1)
tests/testthat/test-cohort-schedule.R:  expect_identical(sched3$max_time, max_t)
tests/testthat/test-cohort-schedule.R:  expect_identical(sched3$times(1), times1)
tests/testthat/test-cohort-schedule.R:  expect_identical(sched3$times(2), times2)
tests/testthat/test-cohort-schedule.R:  expect_identical(sched3$times(3), times2)
tests/testthat/test-cohort-schedule.R:  expect_identical(sched$max_time, max_t)
tests/testthat/test-cohort-schedule.R:  expect_identical(sched3$max_time, 2 * max_t)
tests/testthat/test-cohort.R:    expect_identical(dgdh, dgdh_forward)
tests/testthat/test-cohort.R:    ## expect_identical(dgdh2, dgdh_richardson)
tests/testthat/test-cohort.R:    expect_identical(cohort$plant$internals[v], plant$internals[v])
tests/testthat/test-cohort.R:    expect_identical(cohort$fecundity, 0.0);
tests/testthat/test-cohort.R:    expect_identical(plant$height, h)
tests/testthat/test-cohort.R:    expect_identical(cohort$height, h)
tests/testthat/test-control.R:  expect_identical(sort(names(ctrl)), keys)
dfalster commented 6 years ago

Based on the above it seems we have widespread use of both expect_equal and expect_identical, perhaps without necessarily any strong logic underpinning which. Given that I feel comfortable to update the failing tests from using expect_identical to expect_equal.

Hadley's book R packages suggests:

"There are two basic ways to test for equality: expect_equal(), and expect_identical(). expect_equal() is the most commonly used: it uses all.equal() to check for equality within a numerical tolerance..... If you want to test for exact equivalence, or need to compare a more exotic object like an environment, use expect_identical()."

@richfitz -- any objections on downgrading our expectation from expect_identical to expect_equal for some numerical operations?

RemkoDuursma commented 6 years ago

When you compare any floats, you should use all.equal (expect_equal), not identical. It often works fine, until it doesn't.

On Fri, Nov 3, 2017 at 3:39 PM, Daniel Falster notifications@github.com wrote:

Based on the above it seems we have widespread use of both expect_equal and expect_identical, perhaps without necessarily any strong logic underpinning which. Given that I feel comfortable to update the failing tests from using expect_identical to expect_equal.

Hadley's book R packages http://r-pkgs.had.co.nz/tests.html suggests:

"There are two basic ways to test for equality: expect_equal(), and expect_identical(). expect_equal() is the most commonly used: it uses all.equal() to check for equality within a numerical tolerance..... If you want to test for exact equivalence, or need to compare a more exotic object like an environment, use expect_identical()."

@richfitz https://github.com/richfitz -- any objections on downgrading our expectation from expect_identical to expect_equal for some numerical operations?

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/traitecoevo/plant/issues/184#issuecomment-341621340, or mute the thread https://github.com/notifications/unsubscribe-auth/AAbnCjf2yUXhBmTc-2Lk0ixxSqx5HuJpks5sypj7gaJpZM4QQdzd .

dfalster commented 6 years ago

Tests 4 and 5 in the failing list concern a test in test-plant-runner.R on ln 172 and line 177.

This specific test that was developed in #174. Because the test concerns numerical errors, it makes sense that behaviour on different system may be slightly different.