Reorganize build system and testing

j-hui commented 3 years ago

Not quite ready to merge, but ready to review.

There are a few things left that I want to do for this PR:

[x] pretty print the SSM program into the qc-report directory upon failure
[x] add some convenience make methods to compile the generated program in the report directory (instead of manually cp {qc-report,genc}/quickcheckgen.c && make quickcheckgen)
[x] Add timestamps to each invocation
[x] Rename linux platform to simulation
[x] Rename test platform to trace
[x] Add examples/ back in, and use that as starting point for regression test suite
[x] Make regression tests run before arbitrary test
[ ] ~Add regression-edsl test suite~ (punted to when I come up with compiler interface)
[x] Fix this queue issue

And a few things left for later PRs:

Clean up the runtime and its debug API
Clean up the simulation (currently "linux") platform, to be specialized for just running programs in simulation mode
Add Zephyr as a platform to this build system, which will be interesting since it uses a different build system
Add regression-edsl test suite

j-hui commented 3 years ago

From my TODOs:

Add examples/ back in, and use that as starting point for regression test suite

So I've been a little bit stumped about how best to organize this. I'd like to reuse as much of the quickcheck testing facilities I've extracted out so far, so that should probably go in its own subdirectory/library.

I want to have a regression suite directory populated with tricky SSM examples, that can be automatically discovered using hspec-discover, which will use the same quickcheck testing facilities, but specialized to the specific program specified by the regression test. I also want to have these examples out of the way of the main library to reduce compile time/build noise due to BinderAnn output, i.e., stack build shouldn't also compile everything in the regression test suite.

Maybe that should be a PR of its own, since it entails moving around lots of files and package.yaml tweaks. This PR as it is now is already fairly substantial and I don't want to keep it away from the main branch for too long, for fear of potential merge conflicts. @Rewbert , what do you think?

Note that right now, running stack test still yields bugs I haven't tracked down yet. I'll do that this afternoon.

j-hui commented 3 years ago

The way we write regression tests should be largely compatible with #12, as in there should be minimal boilerplate code in the regression test file to run the compiler/run the test.

Rewbert commented 3 years ago

I'll ask Agustin what can be done about the binderann noise :) Maybe he can tell me how to hack on it to silence it a bit more.

I am all for having a regression test suite. I had collected a list of some 30-ish tests that all tested some tricky aspect of the runtime system, but when we changed the internal AST to consider changed to be an expression rather than a statement, the tests became ill-formed. I can try to manually rewrite some of them (the shrunken ones), but some are quite big and can not manually be altered. Not all of these tests will contain changed calls though, since changed has not been the source of every bug :P In a lot of cases they were just shrunk away. One option is to rewrite them all using our EDSL frontend. That way the tests would not be dependent on the internal AST, and would survive future changes as well. This sounds like a far smarter choice. QuickCheck generates programs in the internal representation directly.

sedwards-lab commented 3 years ago

The obvious thing is to write (and maintain) a pretty-printer that can dump the internal AST in the form of a program

Rewbert commented 3 years ago

Right now it's dumping the internal AST. Since our programs will be functions in Haskell, to do that we would need to perhaps use some template Haskell stuff to use quasi quotation to load the Haskell definitions at compile time.

Rewbert commented 3 years ago

Or maybe it can dump some printed Haskell functions to an actual Haskell source file.

j-hui commented 3 years ago

We should dump the internal AST, but make sure that it can be directly reused in a regression test (i.e., we shouldn't have to worry about name clashes when we import things). I think the simpler approach is to just dump it to a Haskell source file.

Rewbert commented 3 years ago

This is almost what we are doing now. The issue is that if we change the internal AST at a later point in time, we have to modify these regressions tests to conform with the new AST.

I've not really seen anyone dump test cases to Haskell Source files before, and I am thinking maybe there's some reason I am not aware of. If we just dump the internal AST we can just dump it to any kind of file and then read it into the heap with read.

j-hui commented 3 years ago

The issue is that if we change the internal AST at a later point in time, we have to modify these regressions tests to conform with the new AST.

I was thinking about that this morning. I think we can get away with two sets of regression tests, one that tracks problematic, randomly generated LowCore ASTs, and one that contains known problematic high level programs. Once we make a change to the LowCore AST, we need only fix/trash those.

Also, then again, any regression test suite is going to have issues if we make any modification to the interface with which they are built on, including if we make changes to the surface EDSL.

j-hui commented 3 years ago

The latest commit now dumps the generated SSM program into a syntactically valid Haskell file that we can inspect and feed back to a regression suite.

I'm still persistently encountering strange inconsistencies where the interpreted continuation queue fills up before the executed contqueue does. I'm not totally sure why this is the case but perhaps that's beyond the scope of this PR. @Rewbert what do you think?

Rewbert commented 3 years ago

Hm they didn't behave differently before so something changed :p I'd consider it a bug I think. I can dive into it on Tuesday if you'd like!

j-hui commented 3 years ago

Ah, shoot. Yeah, if I haven't figured it out by then, your help would be much appreciated (:

Rewbert commented 3 years ago

Are the differences big or is it some off by one error w.r.t the size of the queue?

j-hui commented 3 years ago

I can't tell what the size of the contqueue is but it's off by four events preceding the queue reaching capacity.

On Sun, Jun 13, 2021, 1:00 PM Robert Krook @.***> wrote:

Are the differences big or is it some off by one error w.r.t the size of the queue?

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/Rewbert/ssm-edsl/pull/11#issuecomment-860241480, or unsubscribe https://github.com/notifications/unsubscribe-auth/AC2A5DFR54GNJKHYAQFTQJTTSTP27ANCNFSM46LEWUZA .

j-hui commented 3 years ago

With those last few commits, I started a regression test suite, and added capabilities to load automatically generated tests into the that test suite (using trace-report/redo-test save <test-name>). Those tests will are automatically discovered using hspec-discover.

With this foundation in place, it should be fairly straightforward to also build out another test suite for examples written in the high level EDSL.

To only test the randomly generated programs, run stack test :arbitrary. To only run the regression tests, run stack test :regression-low.

j-hui commented 3 years ago

I was able to fix the bug that @sedwards-lab spotted, which was that I was overspecifying the trace comparison procedure and comparing that events are being applied in the same order; this is beyond the SSM semantics, and should not be compared.

When I unrolled the trace comparison to thread it through the PropertyM monad, I hadn't noticed that @Rewbert had overloaded the == function for the Trace type synonym. I recreated the same functionality in my comparison function.

At some point, I want to factor out all this diff formatting into its own thing, to declutter the trace comparison logic.

Rewbert commented 3 years ago

I have no more thoughts about this PR, so I'll go ahead and merge it :) Really great work John!

j-hui commented 3 years ago

oh rip there was one last thing lol

j-hui commented 3 years ago

I figured out how to control the number of quickcheck tests from the command line, from stack via hspec: http://hspec.github.io/options.html

ssm-lang / Scoria

Reorganize build system and testing #11