Open axch opened 9 years ago
There are also still a few IPython notebooks knocking around in the examples/ directory. We should either deploy a means to mechanically test them, or finally get rid of them.
https://github.com/bollwyvl/nosebook might be useful?
Where do these integration tests live? It might be worth it to have a jenkins instance that does expect to take an hour or more per run, but gets run less frequently, perhaps daily. Certainly long-running jobs seem like an actual use case, so making sure that we aren't crashing for some reason on larger problems, e.g. due to a new memory leak, etc., seems worthwhile.
The screen problem seems common, and worth fixing. Given that we're likely to use the plot-to-screen functionality frequently and notice if it breaks, I'd emphasize mechanically testing the plot-to-file variant instead. Writing commands that check that e.g. 1<x<50% of such a file is dark pixels seems like a decent way to check that they're doing something reasonable, though of course those bounds should be empirically set.
The notebooks question feels like a separate bug. Perhaps part of #50 instead. I'll add luac's suggestion there.
The extant tests are in test/integration/test_examples.py
In principle every code artifact under the examples/ directory should be either tested or deleted.
It's been a while since I looked through them to ascertain what they were examples of and whether they were still useful as examples.
I agree that in principle a slow integration test that runs them with large parameters is a useful Jenkins job, but I would be uncomfortable if that were the only thing that exercised them, so I think fast smoke tests are still valuable.
After discussion today, we want to convert all the ipy notebooks and all the other sets of examples to the erb/markdown/venture format used for the tutorial, and support basic assertions about these example sequences.
We didn't discuss how to make smoke testing work, but I could imagine e.g. mechanically transforming all numbers >=3 to be =3 instead or similar simple transformations, running through the entire suite, and ensuring that the types of results at each point are what we expect: a number, a plot, nothing, etc.
For the longer-running tests, @riastradh-probcomp 's suggestion of having golden files at some level of comparison still bugs me, but I'm not sure of the value of just running them, especially when there is any kind of interactivity expected (e.g. to close a plot before going on). My inclination would be to stub plot-generation for this longer-running version, and assert that the full set of examples takes about as long to run through as we expect historically, and that the plots have roughly the same color composition as we expect historically. The problem of blessing a particular instance of the history still comes up, as well as what it means to be "roughly the same".
So now this devolves into a few tasks:
I interpret the above conversation as implying that this issue was blocked on #54. That being closed, labeling unblocked.
To clarify the current goal here:
examples/
except those to be pruned by #144 examples/notebooks
are optional with respect to Release 0.4.3]test/integration/test_examples.py
that rely on external processes and calling timeout
should be flushed.Let's call this one the stretch goal for a good release 0.5. Labeling blocked at least on #51, and probably even on everything else in the release 0.5 milestone.
The punch list:
Is nesterov
broken in the Brownian motion example?
Will not test examples/brownian.
Will not test examples/venstan. That integration is sufficiently marginal, and is itself tested separately, not to bother keeping the example pristine.
As the one who added pipits last summer, I don't think it's worth maintaining; doesn't really exemplify anything in particular.
There are integration tests that amount to testing that they do not crash on startup: run them under "timeout" and check that they indeed time out.
Subproblems:
Part of #48 .
Edit: See updated summary below (dated Nov 10, 2015).