Closed olson-sean-k closed 2 years ago
This line in the optimized-fit algorithm is suspicious. It subtracts the target width from the line width where the target width is derived from the width given to textwrap::wrap
. IIUC, that's usize::MAX
in this case, so underflow occurs. Indeed the bug does not occur when the smawk
feature is disabled and the first-fit algorithm is used instead.
I believe that subtraction could underflow for inputs that aren't as pathological as usize::MAX
. Perhaps a saturating operation should be used there instead?
Hi @olson-sean-k, thanks for the test case and the excellent analysis. I have a test cases for wrapping with the maximum line width, but perhaps I only checked the first-fit algorithm... I'll have to check tomorrow.
As for fixing this, I'm not sure exactly how, unfortunately. The algorithm does not lend itself well to saturated addition/subtraction (unfortunately!) and so I've been fighting with corner cases like this one for a while.
Basically, the original algorithm assumed infinite precision math (I ported it from Python) and this breaks down in Rust.
I found the test mentioned above and while it does use the optimal-fit algorithm, it only tested a very short string:
#[test]
fn max_width() {
assert_eq!(wrap("foo bar", usize::max_value()), vec!["foo bar"]);
}
That is of course rather unfortunate and something I will dig into.
Thanks for taking a look!
Basically, the original algorithm assumed infinite precision math (I ported it from Python) and this breaks down in Rust.
I'm not sure this would be ideal, but maybe the optimized-fit algorithm could use (or fall back to) something like num_bigint::BigUInt
for intermediate computations. I believe num-bigint
's integer types behave similarly to Python's integers and will grow dynamically as needed.
That may be overkill though and may decrease performance for common use cases that work properly today. If the bounds are known, maybe the algorithm could accept a fixed-sized width (e.g., u64
instead of usize
) and use larger integers for intermediate computations (e.g., u128
). u128
may use a software implementation on some targets, but I'm guessing that any performance impact would be smaller than using a big integer like BigUint
. Since the APIs are already backed by conversions, it should be possible to keep things ergonomic despite using fixed-sized integers.
Yeah, you're right: we could go all-in and use a big-int crate of some sort... but I feel it's overkill as you suggest.
For "normal" terminal widths, the wrapping works fine from what I can tell, so I would rather limit the size of the input widths. I've been playing with that in the past, but I didn't finish it.
I've done more testing with a build where line widths are limited to u16
and where the optimal-fit computations are done with u64
. The fuzz test seems to no longer crash with that setup!
I will have to clean it up a bit and try to understand the consequences of limiting the line width in this way, especially since I'm also trying to make the library work outside of the console (as the Wasm demo). See also #247 for related discussions.
I've been testing this a lot more and I believe the solution in #421 will work well: use f64
internally when computing the wrapping cost. With that change, I can no longer trigger overflows in my tests.
When wrapping text at a very large width (e.g.,
usize::MAX
) viatextwrap::wrap
, it is instead wrapped at small and inconsistent widths. This code......prints this:
Note that the wrapped width is not consistent, as the word "English" could fit on the first line without exceeding the width of the last line. I would not expect the text to be wrapped here at all, since it never reaches a width of
usize::MAX
. I'm not sure at what large widths this behavior begins to manifest.