Closed mratsim closed 5 years ago
Your timings are off - make test
in nimbus and nim-beacon-chain includes running the tests - for example in nim-beacon-chain
, that includes doing several epochs of state simulation (heavy cryptography) on the generally underpowered travis CI. You would need to measure the build time instead to make this argument, which in our case was around 10-20s per executable last I checked.
testament
itself is a bit of a hack that's gone through several iterations but still lacks several features that are useful in library development (as opposed to command-line application development) - it focuses on running executables, not on running functions / code. as such, it's more of an integration testing tool, not that great for unit testing. I'd say a primary reason it's used with the compiler is that the compiler is implemented as a monolith command line app that's hard to unit test - there are no units in there, so all you can do really is run the whole thing every time.
The parallelism is also a bit of an issue in there: on the one hand, you can run tests in parallel, but that implies compiling every test separately which pretty much cancels out all benefits of parallel runs and instead makes it orders of magnitude slower (unless you have a cpu per test).
There's an hack in testament
in the form of megatest
that effectively concatenates the tests which kind of works for the nim
unit test suite as a tradeoff to not have to rewrite the old tests, but it's not really.. uh.. a thoughtful long-term solution and was implemented more as a workaround for the extreme run times of the compiler test suite - see note about parallelism vs compile time above. in particular, these tests do not run in parallel.
generally when crafting tests, the a key idea is to ensure that they're independent - this is what makes them safe for concurrency/parallelization - testament
is good in ensuring that each test runs in an isolated environment but does so at great cost (separate compiles) - it's usually a more reasonable trade-off to put constraints on test code instead, specially when developing libraries. you can see this in practise today as well: the tests that are not part of megatest are effectively the ones that have been proven not to be written with independence in mind. of course, this also assumes that the libraries being tested can be used independently.
on the independence note, it would be interesting to see if the unittest
macro can be made powerful enough to detect which tests are actually independent and run those in parallel - this would imply it has to understand which are side-effect free, effectively.
unfortunately, a lot of code in nim relies on mutable globals and other stateful features, or compile-time flags like checks and liberal use of when
options complicating the matter - it's not a given that this would yield good returns on average, even if well-designed libraries would benefit.
How to time a Nimbus test compilation (the biggest, aggregated one):
$ rm -rf nimcache; time ./env.sh nim c --out:./build/all_tests -d:release -d:chronicles_log_level=ERROR --verbosity:0 --hints:off --warnings:off -d:usePcreHeader --passL:"-lpcre" tests/all_tests.nim
real 1m3.072s
user 2m39.329s
sys 0m17.749s
It has quite a few custom macros and templates that affect compile time, so it's not useful to measure unittest's overhead. You need a synthetic test file for that.
I don't agree it should be in stdlib: i really think we should improve unittest, or rewrite a similar library: testatement feels very closely related to the compiler needs to me
on the independence note, it would be interesting to see if the unittest macro can be made powerful enough to detect which tests are actually independent and run those in parallel - this would imply it has to understand which are side-effect free, effectively. @arnetheduck in any case, an easy way to just manually separate suites/groups of tests that can be ran in parallel is good
i really think we should improve unittest
@arnetheduck in any case, an easy way to just manually separate suites/groups of tests that can be ran in parallel is good
the key point here is that it would be nice to do this automatically - it's easy to accidentally introduce something that breaks the necessary invariants - just like it's hard to write thread-safe code in general in nim when --threads:off
is the default - this is something compilers are much better at reasoning about than humans.
@arnetheduck agree: but having a manual fallback/override is still useful
testament should not be in the "stdlib", but it should become a tool that we ship with Nim and that supports at least invocations like testament <glob here>
.
Is testament perfect? Of course not, but it's what is used, it supports the features we need and it's maintained. It's even documented but admittedly its documentation is hard to find. ;-)
From me this thing gets a NO. Tester is an always changing testing framework. It is desinged to test the compiler only. It has many problems there weren't fixed because it is an internal tool only. Exposing it for public use means that certain problems can't be fixed anymore. We can't bend this tool for our needs anymore. It will make our lifer unnecessarily harder. Also when testament is used by external people, it will get features that have no use for internal testing. People will make pull requests to add features that we then have to support and maintain. Please undo this.
Not raising a RFC because I think everyone (or at least @Araq) agrees that it should be in the standard library
As projects get bigger and bigger, continuous integration time starts to be a bottleneck for developers and reviewers productivity.
The unittest module works fine for small to medium projects however bigger projects will hit a ceiling and will be forced to complexify their build system or create an adhoc test library:
Parallelism
unittest
doesn't support parallel builds, testament does.Memory usage
The canonical way to use
unittest
is to have a singleall_tests.nim
file that imports all the other test files. For generic heavy libraries like Arraymancer, the RAM usage can get quickly prohibitive and require splitting.See https://github.com/mratsim/Arraymancer/issues/359#issuecomment-500107895 for memory figures in the 2+GB, Travis has a 3+GB hard limit
Max number of Global variables
Nim GC has a max number of global variables at 3500,
unittest
is a template-only library and naive usage will put every single variable in the global scope.Arraymancer full test suite with CPU tests + OpenCL tests + Cuda tests cannot compile.
Slowness
Unconfirmed but I suspect that the heavy template instantiations of unittest have a significant share of the slowness of the CI in:
Conclusion
Unfortunately, as project matures, more and more tests will be added, especially crypto and blockchain projects. Some may include significant computations (cryptographic signature or tensor operations).
Also on dev machines, it would be nice to be able to do fuzz testing, property-based testing and metamorphic testing in parallel as those require significant time but can be parallelized per tested routines.
Downstream issues: