Closed j-hui closed 3 years ago
From my TODOs:
Add examples/ back in, and use that as starting point for regression test suite
So I've been a little bit stumped about how best to organize this. I'd like to reuse as much of the quickcheck testing facilities I've extracted out so far, so that should probably go in its own subdirectory/library.
I want to have a regression suite directory populated with tricky SSM examples, that can be automatically discovered using hspec-discover
, which will use the same quickcheck testing facilities, but specialized to the specific program specified by the regression test. I also want to have these examples out of the way of the main library to reduce compile time/build noise due to BinderAnn output, i.e., stack build
shouldn't also compile everything in the regression test suite.
Maybe that should be a PR of its own, since it entails moving around lots of files and package.yaml
tweaks. This PR as it is now is already fairly substantial and I don't want to keep it away from the main branch for too long, for fear of potential merge conflicts. @Rewbert , what do you think?
Note that right now, running stack test
still yields bugs I haven't tracked down yet. I'll do that this afternoon.
The way we write regression tests should be largely compatible with #12, as in there should be minimal boilerplate code in the regression test file to run the compiler/run the test.
I'll ask Agustin what can be done about the binderann noise :) Maybe he can tell me how to hack on it to silence it a bit more.
I am all for having a regression test suite. I had collected a list of some 30-ish tests that all tested some tricky aspect of the runtime system, but when we changed the internal AST to consider changed
to be an expression rather than a statement, the tests became ill-formed. I can try to manually rewrite some of them (the shrunken ones), but some are quite big and can not manually be altered. Not all of these tests will contain changed
calls though, since changed
has not been the source of every bug :P In a lot of cases they were just shrunk away.
One option is to rewrite them all using our EDSL frontend. That way the tests would not be dependent on the internal AST, and would survive future changes as well. This sounds like a far smarter choice. QuickCheck generates programs in the internal representation directly.
The obvious thing is to write (and maintain) a pretty-printer that can dump the internal AST in the form of a program
Right now it's dumping the internal AST. Since our programs will be functions in Haskell, to do that we would need to perhaps use some template Haskell stuff to use quasi quotation to load the Haskell definitions at compile time.
Or maybe it can dump some printed Haskell functions to an actual Haskell source file.
We should dump the internal AST, but make sure that it can be directly reused in a regression test (i.e., we shouldn't have to worry about name clashes when we import things). I think the simpler approach is to just dump it to a Haskell source file.
This is almost what we are doing now. The issue is that if we change the internal AST at a later point in time, we have to modify these regressions tests to conform with the new AST.
I've not really seen anyone dump test cases to Haskell Source files before, and I am thinking maybe there's some reason I am not aware of. If we just dump the internal AST we can just dump it to any kind of file and then read it into the heap with read
.
The issue is that if we change the internal AST at a later point in time, we have to modify these regressions tests to conform with the new AST.
I was thinking about that this morning. I think we can get away with two sets of regression tests, one that tracks problematic, randomly generated LowCore ASTs, and one that contains known problematic high level programs. Once we make a change to the LowCore AST, we need only fix/trash those.
Also, then again, any regression test suite is going to have issues if we make any modification to the interface with which they are built on, including if we make changes to the surface EDSL.
The latest commit now dumps the generated SSM program into a syntactically valid Haskell file that we can inspect and feed back to a regression suite.
I'm still persistently encountering strange inconsistencies where the interpreted continuation queue fills up before the executed contqueue does. I'm not totally sure why this is the case but perhaps that's beyond the scope of this PR. @Rewbert what do you think?
Hm they didn't behave differently before so something changed :p I'd consider it a bug I think. I can dive into it on Tuesday if you'd like!
Ah, shoot. Yeah, if I haven't figured it out by then, your help would be much appreciated (:
Are the differences big or is it some off by one error w.r.t the size of the queue?
I can't tell what the size of the contqueue is but it's off by four events preceding the queue reaching capacity.
On Sun, Jun 13, 2021, 1:00 PM Robert Krook @.***> wrote:
Are the differences big or is it some off by one error w.r.t the size of the queue?
— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/Rewbert/ssm-edsl/pull/11#issuecomment-860241480, or unsubscribe https://github.com/notifications/unsubscribe-auth/AC2A5DFR54GNJKHYAQFTQJTTSTP27ANCNFSM46LEWUZA .
With those last few commits, I started a regression test suite, and added capabilities to load automatically generated tests into the that test suite (using trace-report/redo-test save <test-name>
). Those tests will are automatically discovered using hspec-discover.
With this foundation in place, it should be fairly straightforward to also build out another test suite for examples written in the high level EDSL.
To only test the randomly generated programs, run stack test :arbitrary
. To only run the regression tests, run stack test :regression-low
.
I was able to fix the bug that @sedwards-lab spotted, which was that I was overspecifying the trace comparison procedure and comparing that events are being applied in the same order; this is beyond the SSM semantics, and should not be compared.
When I unrolled the trace comparison to thread it through the PropertyM
monad, I hadn't noticed that @Rewbert had overloaded the ==
function for the Trace
type synonym. I recreated the same functionality in my comparison function.
At some point, I want to factor out all this diff formatting into its own thing, to declutter the trace comparison logic.
I have no more thoughts about this PR, so I'll go ahead and merge it :) Really great work John!
oh rip there was one last thing lol
I figured out how to control the number of quickcheck tests from the command line, from stack via hspec: http://hspec.github.io/options.html
Not quite ready to merge, but ready to review.
There are a few things left that I want to do for this PR:
cp {qc-report,genc}/quickcheckgen.c && make quickcheckgen
)linux
platform tosimulation
test
platform totrace
examples/
back in, and use that as starting point for regression test suiteregression-edsl
test suite~ (punted to when I come up with compiler interface)And a few things left for later PRs:
simulation
(currently "linux") platform, to be specialized for just running programs in simulation moderegression-edsl
test suite