mamba-org / resolvo

Fast package resolver written in Rust (CDCL based SAT solving)
BSD 3-Clause "New" or "Revised" License
154 stars 13 forks source link

Revamp testing setup #19

Open aochagavia opened 7 months ago

aochagavia commented 7 months ago

The current testing setup is somewhat barebones. We have about 30 tests for different scenarios, but they all have in common that they are pretty short and do not necessarily reflect real-world use cases. Similar in spirit to #18, I think it would pay off to raise the bar here.

We could look at PubGrub for inspiration. According to this comment, they have code to generate test cases and test them against their own solver and a different SAT solver. Is this accurate @Eh2406? Do you have any suggestions of specific files we should look at for inspiration?

I haven't thought much about alternatives yet, but I could dedicate some time to it later. @baszalmstra @tdejager have you thought about this?

tdejager commented 7 months ago

I think its good keeping the current tests as kind of unit tests. However, a more elaborate setup is definitely useful. Like in the mentioned comment if we could somehow come up with a way to create generic problem cases that we can re-use this would be great!

We have a lot of conda/python examples but these will have a lot more "baggage" attached to them then somewhat more pure solving cases.

Eh2406 commented 7 months ago

We could look at PubGrub for inspiration. According to https://github.com/mamba-org/resolvo/issues/2#issuecomment-1742230945, they have code to generate test cases and test them against their own solver and a different SAT solver. Is this accurate @Eh2406? Do you have any suggestions of specific files we should look at for inspiration?

This work was originally done for cargoes test suite, can currently be found in here. It was simplified and rearranged for pubgrub into sat and proptests. With additional documentation at Testing and benchmarking in our guide. This code is not as clear as I should've made it. If you have questions I would be happy to answer them! If you have improvements I would be happy to back port them! Despite their limitations, this test code has prevented me merging many bugs.

Since this codes development, there have been a couple of relevant changes to randomized testing in the Rust ecosystem. proptest_state_machine is a new API for generating random inputs, that is supposed be better for self-referential data. I have not tried using it for generating registries, but it might end up being more elegant. Alternatively, cargo fuzz now has added support for Windows. (I'm a Windows-based life form.) So trying to do "coverage guided input" instead of or in addition to "random input" is intriguing.

I think its good keeping the current tests as kind of unit tests. However, a more elaborate setup is definitely useful.

Both cargo and pubgrub have several kinds of more direct testing. Which are great for checking that are API still works the way we intended and smoked testing the core of the algorithm. But almost never happen to be the interesting case where after several different decisions the optimization I'm in the middle of adding turns out to be invalid.