nim-lang / Nim

Nim is a statically typed compiled systems programming language. It combines successful concepts from mature languages like Python, Ada and Modula. Its design focuses on efficiency, expressiveness, and elegance (in that order of priority).
https://nim-lang.org
Other
16.54k stars 1.47k forks source link

Expose testament as part of the standard library #12084

Closed mratsim closed 5 years ago

mratsim commented 5 years ago

Not raising a RFC because I think everyone (or at least @Araq) agrees that it should be in the standard library

As projects get bigger and bigger, continuous integration time starts to be a bottleneck for developers and reviewers productivity.

The unittest module works fine for small to medium projects however bigger projects will hit a ceiling and will be forced to complexify their build system or create an adhoc test library:

Parallelism

unittest doesn't support parallel builds, testament does.

Memory usage

The canonical way to use unittest is to have a single all_tests.nim file that imports all the other test files. For generic heavy libraries like Arraymancer, the RAM usage can get quickly prohibitive and require splitting.

See https://github.com/mratsim/Arraymancer/issues/359#issuecomment-500107895 for memory figures in the 2+GB, Travis has a 3+GB hard limit

Max number of Global variables

Nim GC has a max number of global variables at 3500, unittest is a template-only library and naive usage will put every single variable in the global scope.

Arraymancer full test suite with CPU tests + OpenCL tests + Cuda tests cannot compile.

Slowness

Unconfirmed but I suspect that the heavy template instantiations of unittest have a significant share of the slowness of the CI in:

Conclusion

Unfortunately, as project matures, more and more tests will be added, especially crypto and blockchain projects. Some may include significant computations (cryptographic signature or tensor operations).

Also on dev machines, it would be nice to be able to do fuzz testing, property-based testing and metamorphic testing in parallel as those require significant time but can be parallelized per tested routines.

Downstream issues:

arnetheduck commented 5 years ago

Your timings are off - make test in nimbus and nim-beacon-chain includes running the tests - for example in nim-beacon-chain, that includes doing several epochs of state simulation (heavy cryptography) on the generally underpowered travis CI. You would need to measure the build time instead to make this argument, which in our case was around 10-20s per executable last I checked.

testament itself is a bit of a hack that's gone through several iterations but still lacks several features that are useful in library development (as opposed to command-line application development) - it focuses on running executables, not on running functions / code. as such, it's more of an integration testing tool, not that great for unit testing. I'd say a primary reason it's used with the compiler is that the compiler is implemented as a monolith command line app that's hard to unit test - there are no units in there, so all you can do really is run the whole thing every time.

The parallelism is also a bit of an issue in there: on the one hand, you can run tests in parallel, but that implies compiling every test separately which pretty much cancels out all benefits of parallel runs and instead makes it orders of magnitude slower (unless you have a cpu per test).

There's an hack in testament in the form of megatest that effectively concatenates the tests which kind of works for the nim unit test suite as a tradeoff to not have to rewrite the old tests, but it's not really.. uh.. a thoughtful long-term solution and was implemented more as a workaround for the extreme run times of the compiler test suite - see note about parallelism vs compile time above. in particular, these tests do not run in parallel.

generally when crafting tests, the a key idea is to ensure that they're independent - this is what makes them safe for concurrency/parallelization - testament is good in ensuring that each test runs in an isolated environment but does so at great cost (separate compiles) - it's usually a more reasonable trade-off to put constraints on test code instead, specially when developing libraries. you can see this in practise today as well: the tests that are not part of megatest are effectively the ones that have been proven not to be written with independence in mind. of course, this also assumes that the libraries being tested can be used independently.

arnetheduck commented 5 years ago

on the independence note, it would be interesting to see if the unittest macro can be made powerful enough to detect which tests are actually independent and run those in parallel - this would imply it has to understand which are side-effect free, effectively.

unfortunately, a lot of code in nim relies on mutable globals and other stateful features, or compile-time flags like checks and liberal use of when options complicating the matter - it's not a given that this would yield good returns on average, even if well-designed libraries would benefit.

stefantalpalaru commented 5 years ago

How to time a Nimbus test compilation (the biggest, aggregated one):

$ rm -rf nimcache; time ./env.sh nim c --out:./build/all_tests -d:release -d:chronicles_log_level=ERROR --verbosity:0 --hints:off --warnings:off -d:usePcreHeader --passL:"-lpcre" tests/all_tests.nim

real    1m3.072s
user    2m39.329s
sys 0m17.749s

It has quite a few custom macros and templates that affect compile time, so it's not useful to measure unittest's overhead. You need a synthetic test file for that.

alehander92 commented 5 years ago

I don't agree it should be in stdlib: i really think we should improve unittest, or rewrite a similar library: testatement feels very closely related to the compiler needs to me

alehander92 commented 5 years ago

on the independence note, it would be interesting to see if the unittest macro can be made powerful enough to detect which tests are actually independent and run those in parallel - this would imply it has to understand which are side-effect free, effectively. @arnetheduck in any case, an easy way to just manually separate suites/groups of tests that can be ran in parallel is good

stefantalpalaru commented 5 years ago

i really think we should improve unittest

https://github.com/stefantalpalaru/nim-unittest2

arnetheduck commented 5 years ago

@arnetheduck in any case, an easy way to just manually separate suites/groups of tests that can be ran in parallel is good

the key point here is that it would be nice to do this automatically - it's easy to accidentally introduce something that breaks the necessary invariants - just like it's hard to write thread-safe code in general in nim when --threads:off is the default - this is something compilers are much better at reasoning about than humans.

alehander92 commented 5 years ago

@arnetheduck agree: but having a manual fallback/override is still useful

Araq commented 5 years ago

testament should not be in the "stdlib", but it should become a tool that we ship with Nim and that supports at least invocations like testament <glob here>.

Is testament perfect? Of course not, but it's what is used, it supports the features we need and it's maintained. It's even documented but admittedly its documentation is hard to find. ;-)

krux02 commented 5 years ago

From me this thing gets a NO. Tester is an always changing testing framework. It is desinged to test the compiler only. It has many problems there weren't fixed because it is an internal tool only. Exposing it for public use means that certain problems can't be fixed anymore. We can't bend this tool for our needs anymore. It will make our lifer unnecessarily harder. Also when testament is used by external people, it will get features that have no use for internal testing. People will make pull requests to add features that we then have to support and maintain. Please undo this.