probcomp / Venturecxx

Primary implementation of the Venture probabilistic programming system
http://probcomp.csail.mit.edu/venture/
GNU General Public License v3.0
28 stars 6 forks source link

The examples are tested unsatisfactorily #53

Open axch opened 9 years ago

axch commented 9 years ago

There are integration tests that amount to testing that they do not crash on startup: run them under "timeout" and check that they indeed time out.

Subproblems:

Part of #48 .

Edit: See updated summary below (dated Nov 10, 2015).

axch commented 9 years ago

There are also still a few IPython notebooks knocking around in the examples/ directory. We should either deploy a means to mechanically test them, or finally get rid of them.

lenaqr commented 9 years ago

https://github.com/bollwyvl/nosebook might be useful?

gregory-marton commented 9 years ago

Where do these integration tests live? It might be worth it to have a jenkins instance that does expect to take an hour or more per run, but gets run less frequently, perhaps daily. Certainly long-running jobs seem like an actual use case, so making sure that we aren't crashing for some reason on larger problems, e.g. due to a new memory leak, etc., seems worthwhile.

The screen problem seems common, and worth fixing. Given that we're likely to use the plot-to-screen functionality frequently and notice if it breaks, I'd emphasize mechanically testing the plot-to-file variant instead. Writing commands that check that e.g. 1<x<50% of such a file is dark pixels seems like a decent way to check that they're doing something reasonable, though of course those bounds should be empirically set.

The notebooks question feels like a separate bug. Perhaps part of #50 instead. I'll add luac's suggestion there.

axch commented 9 years ago

The extant tests are in test/integration/test_examples.py

In principle every code artifact under the examples/ directory should be either tested or deleted.

It's been a while since I looked through them to ascertain what they were examples of and whether they were still useful as examples.

I agree that in principle a slow integration test that runs them with large parameters is a useful Jenkins job, but I would be uncomfortable if that were the only thing that exercised them, so I think fast smoke tests are still valuable.

gregory-marton commented 9 years ago

After discussion today, we want to convert all the ipy notebooks and all the other sets of examples to the erb/markdown/venture format used for the tutorial, and support basic assertions about these example sequences.

We didn't discuss how to make smoke testing work, but I could imagine e.g. mechanically transforming all numbers >=3 to be =3 instead or similar simple transformations, running through the entire suite, and ensuring that the types of results at each point are what we expect: a number, a plot, nothing, etc.

For the longer-running tests, @riastradh-probcomp 's suggestion of having golden files at some level of comparison still bugs me, but I'm not sure of the value of just running them, especially when there is any kind of interactivity expected (e.g. to close a plot before going on). My inclination would be to stub plot-generation for this longer-running version, and assert that the full set of examples takes about as long to run through as we expect historically, and that the plots have roughly the same color composition as we expect historically. The problem of blessing a particular instance of the history still comes up, as well as what it means to be "roughly the same".

So now this devolves into a few tasks:

axch commented 9 years ago

I interpret the above conversation as implying that this issue was blocked on #54. That being closed, labeling unblocked.

axch commented 9 years ago

To clarify the current goal here:

axch commented 8 years ago

Let's call this one the stretch goal for a good release 0.5. Labeling blocked at least on #51, and probably even on everything else in the release 0.5 milestone.

axch commented 8 years ago

The punch list:

axch commented 8 years ago

Is nesterov broken in the Brownian motion example?

axch commented 8 years ago

Will not test examples/brownian.

axch commented 8 years ago

Will not test examples/venstan. That integration is sufficiently marginal, and is itself tested separately, not to bother keeping the example pristine.

lenaqr commented 8 years ago

As the one who added pipits last summer, I don't think it's worth maintaining; doesn't really exemplify anything in particular.