Open spacecowboy opened 6 years ago
Yes, there is some additional work in calculating the changeset, but it should not be that slow. Most importantly, the work is only done, if the assertion fails.
It would be interesting to determine if this is some kind of infinite loop or not.
Does anyone want to investigate this? :-)
I have the same problem. I use this crate to help with comparing my generated code to the expected one in my configure_me crate. I already hit the 60 second limit Rust imposes.
This is my first time tracing down a performance issue but it looks like the majority of the slowdown is caused by the difference.rs
crate and not by rust-pretty-assertions
:
I'll have a look at the diff crate now and see if it can be improved in some way.
The example I used, in case someone want's to replicate it:
use difference::{Changeset};
fn main() {
let x = "abc".repeat(2000);
let y = "abd".repeat(2000);
let c = Changeset::new(&x, &y, "");
println!("{}", c);
}
As a workaround, you can also add
[profile.dev]
opt-level = 2
to your Cargo.toml
It's not only extremely slow but also has a memory leak. On two strings containing xml it took up 4GB before I canceled.
As of Rust 1.41, Cargo supports overriding build settings for specific dependencies, so you can compile the difference
crate (which is what is slowing down pretty_assertions
) with optimizations for the development profile by adding the following to your Cargo.toml
:
[profile.dev.package.difference]
opt-level = 3
This way you don't have to compile your whole crate and all of its dependencies with optimizations when testing just to speed up difference
.
Note that if your crate is part of a workspace, this needs to be added to the Cargo.toml
of the workspace root instead of your specific crate, or it will be ignored otherwise.
After some investigations triggered by #51 into even worse performance with similar I did some deep diving into this particular usecase. Unfortunately it's very common for both this crate and my snapshot testing library that the diff will be entirely distinct. If that becomes large enough this triggers pathological cases in almost all diffing algorithms.
LCS which is what this crate uses needs to create a massive table which is also likely going to exhaust the available memory for large enough distinct inputs. Myers which is what similar performs even worse then LCS for this type of case. So why does git/or the unix diff utility not suffer from this? The answer is that they internally have some heuristics when the cost of the algorithm becomes too large. It also specifically detects cases where too many lines are distinct and explicitly discards them from the diffing algorithm.
So long story short: fixing this would require some more advanced heuristics in Rust diffing libraries but no library implemented those yet to the best of my knowledge (https://github.com/mitsuhiko/similar/issues/15). Since I have the same issue in insta I made the same workaround now that Google's diff-match-patch does: it implements a deadline of maximum wall time to be spent diffing. This works particularly well with Myer's diff where the diff stays somewhat reasonable even if you bail due to deadline hit.
@Ch00k @jasongrlicky Thank you for this tweak - would you be interested in performing some benchmarks and adding a note to the documentation and/or readme via pull request?
cc @tommilligan
Running an
assert_eq
on a large string is extremely slow with this library enabled.I noticed it when I added a placeholder assertion before starting some coding. A regular assertion takes no time at all. With pretty-assertions enabled, the test does not return in 5 minutes where I got tired of waiting and killed it.
This is the code:
And this is the test output (where you can see the actual string):