Generated test suite with cover option contains failing tests

azewiusz commented 1 year ago

Expected vs actual behavior

For certain complexity of program logic we can get cover option to generate unit tests (it may require more time but generates them), once these tests are ran I get assertion failures as if the calculated execution path parameters were not leading to successfully passing of assertion.

To Reproduce I created git repo https://github.com/azewiusz/for-crosshair where I describe how to reproduce this problem. I'm using 0.0.32 version of crosshair-tool.

pschanely commented 1 year ago

Thank you for this issue and detailed repro code! I am having a little trouble getting your results immediately on my mac; looks like you're on Windows, correct? And which version of Python are you using?

I could imagine some challenges using floats in particular with the cover command, but I want to make sure I understand your issue exactly before I get into those details. :)

azewiusz commented 1 year ago

Yes, I'm using cover on windows machine, It may be that my python was updated to v 3.9.10 on the test workstation and it was likely 3.7 before (at a time when I reproduced this issue). Also, after updating to latest 0.0.34 version of crosshair tool this error is gone, so, it was reproducible for sure many times, but only on v0.0.32

pschanely commented 1 year ago

Hmm, ok, my attempt at windows + Python 3.7 + CrossHair 0.0.32 produced a successfully running test case too. That said, I know that over the last few months, we've fixed a handful of issues with cover and diffbehavior, so I still think I'm inclined to chalk it up to one of those.

Now, real talk, one big gotcha with CrossHair and floats: For performance reasons(*) CrossHair approximates floating point behavior using true (arbitrary precision) real numbers. Therefore, it's possible to get coverage cases that are on rounding boundaries and fail to re-execute the expected path.

(*) Z3 is technically capable of doing floating-point-accurate symbolic execution. However, those capabilities are very slow; it might take O(minutes) to reason about a single floating point operation. In an ideal world, we might try both approaches, but I haven't invested much into this idea, as I haven't seen the problem come up too often in practice. (but that's also why it's so important that people file bugs when things don't work, like you have! Thank you!)

azewiusz commented 1 year ago

I spent on this some time today (tried roll back from 0.0.34 to 0.0.32) but I failed to recreate. I potentially lost exact set-up during experimentation. I think what you write is important for the case that I was trying to work with (the floating point accuracy). Thank you for your investigation.

pschanely commented 1 year ago

I think what you write is important for the case that I was trying to work with (the floating point accuracy).

Ah, then I will count this as a vote in favor of working on a feature that also attempts fully accurate floating-point alongside the implementation based on Real numbers!

pschanely / CrossHair

Generated test suite with cover option contains failing tests #188