r-spatial / spdep

Spatial Dependence: Weighting Schemes and Statistics
https://r-spatial.github.io/spdep/
123 stars 27 forks source link

Document how spdep functions are verified #30

Open angela-li opened 5 years ago

angela-li commented 5 years ago

I chatted briefly with @edzer at rstudio::conf and he mentioned the idea of writing tests and doing code coverage for spdep. Would this be something we'd be interested in?

I could take a stab at writing a few tests and at least getting some of the main functions covered. Which ones would those be, in your opinion? Maybe functions in active development could use some tests?

rsbivand commented 5 years ago

No testthat, it is so opaque that actually debugging failures is much harder than writing typical testing code for testthat - the code is easy to write (badly), but very hard to read the results, and the tests give a false impression of testing. For example, most of the time spent on reverse dependency checks where revdeps fail is spent in unpicking where in testthat the failure occurred (and then you see the assumptions made in writing the tests). This is a reflection of the same problem as roxygen, in which you can write good examples, references, etc., but most roxygen writers only do the absolute minimum, and don't provide informative examples.

spdep uses the legacy S approach of examples presenting standard datasets, in spdep's case reproducing output from SpaceStat, Cressie, the LISA article, etc. This approach could be better documented because testthat has obscured the actual S testing framework (using examples from the core literature); we need a vignette on the antecedents for the implemented methods. Yes, examples are not diffed against earlier values (and should be); ASDAR 2ed is diffed, so we pick up numerical changes.

Let's say that the existing testing framework needs documentation, but should not be made uniform with the momentary testthat/codecov fashion. I know that when you think you've got the code covered, a corner case is bound to appear that isn't covered, so the whole effort is largely symbolic, not real, unless the problems are really simple and all corner cases can be enumerated. So badges are a complete distraction, but the man page examples should be documented better, and possibly should include more tests of computed results against expectations. Some corner cases that make the examples too noisy might be moved to a test directory, with pointers from the man pages. The tests should be standard, not testthat. Doe that sound like a way forward?

Tasks would be to check the man pages, find the legacy summary values against which to test, document the testing approach, and find corner cases that are not needed on the man pages, in addition maybe to a blog piece arguing for this (legacy) approach to QA (that the implementations reproduce those from the original literature - see the C&O vignette and my articles with Jan Hauke and Tomasz Kossowski, Gianfranco Piras and now David Wong). This is implicit now, but should be made explicit, I think.