idaholab / MontePy

MontePy is the most user friendly Python library (API) to read, edit, and write MCNP input files.
https://www.montepy.org/
MIT License
32 stars 7 forks source link

Move beyond coverage based testing #530

Closed MicahGale closed 2 months ago

MicahGale commented 2 months ago

I started thinking about this recently:

Why are we finding a lot of bugs in MontePy despite having around 98% code coverage?

This is a broad and complex issue, but in part I came across the concept of "pseudo-tested methods" (this was written about Java). The authors do provide a tool for finding these methods, but it is only implemented for Java.

The authors also wrote an IEE article on this topic, that I should read at some point (doi: 10.1145/2896941.2896944)

Not reading anything won't stop me from drawing conclusions from an Abstract though:

Automated tests play an important role in software evolution because they can rapidly detect faults introduced during changes. In practice, code-coverage metrics are often used as criteria to evaluate the effectiveness of test suites with focus on regression faults. However, code coverage only expresses which portion of a system has been executed by tests, but not how effective the tests actually are in detecting regression faults.

Our goal was to evaluate the validity of code coverage as a measure for test effectiveness. To do so, we conducted an empirical study in which we applied an extreme mutation testing approach to analyze the tests of open-source projects written in Java. We assessed the ratio of pseudo-tested methods (those tested in a way such that faults would not be detected) to all covered methods and judged their impact on the software project. The results show that the ratio of pseudo-tested methods is acceptable for unit tests but not for system tests (that execute large portions of the whole system). Therefore, we conclude that the coverage metric is only a valid effectiveness indicator for unit tests.

So some actionable steps for the time being:

  1. exclude tests/test_integration, etc. from coverage reports
  2. Limit the scope of coverage for specific test packages to specific source code. e.g., tests/test_syntax_parsing should not contribute to the MCNP_Problem coverage

Wishlist:

  1. Have an automated tool to detect pseudo-tested functions
  2. Detect when a function's return value is not tested.
MicahGale commented 2 months ago

It seems like it would be great to have a test to function map. It seems like this plugin somehow gets that data: https://pypi.org/project/pytest-rts/

MicahGale commented 2 months ago

Also should probably move more towards something like pytest-cov