Closed mgeisler closed 2 years ago
Hi @robinkrahl, @NeverGivinUp, @adetaylor, and @olson-sean-k!
You've all been active in #247 and #416 and I would like your feedback on this PR. I believe both issues are caused by the same underlying problem: unintended integer overflows in wrap_optimal_fit
. This seems to only be a problem when you wrap at very large widths (of the same magnitude as usize::max_value()
) or when the fragments you wrap are very large.
In this PR, I'm playing with the idea of limiting the maximum line width to something much smaller than usize
: namely u16
. For terminals, this seems great: the terminal_size
crate already returns the current terminal width as an u16
and I believe this is what the underlying API returns. However, using a small type like this might make other applications impossible? Can we make the type generic and use u16
as a default while letting users plug in their own type if they want?
So now my question to you is: does it sound like a good idea to limit the maximum line width like this? The size of each fragment (word) is similarly limited to u16
in this PR.
I've added tests to ensure that we can still wrap a string with length larger than usize::max()
, provided that each fragment width fits in an u16
. I've also added new fuzz tests which exercise both the first-fit and the optimal-fit wrapping algorithms using randomly generated fragments. The wrap_optimal_fit
fuzz test would immediately find an overflow before, now it runs for millions of iterations without finding any crashes.
Hi Martin, have you tested this with my playground example from #247? I can't right now.
@mgeisler Thanks for working on this issue! Here are my first thoughts. I will give this issue some more thought and maybe do some experiments in the next days.
For the genpdf
use case, I am working with decimal widths (in mm), so I would in general prefer using f32
or f64
widths than having to map the widths to integers. That being sad, from my experience I need at least two decimals for precise calculations without visible errors with typical font sizes of around 12 pt. So a maximum line width in textwrap of u16::MAX
would mean a maximum line width of u16::MAX
/ 100 = 655.35 mm, i. e. landscape A2 would be the maximum paper size (for 12 pt text). This should work fine for almost all real-world applications, but it is not totally impossible that a user could reach this limit with a legitimate use case.
In any case, this change would mean that is no longer possible to use a fixed scale factor of e. g. 100. Therefore it would be helpful to have an interface that automatically scales all widths so that the line width maps to u16::MAX
.
Thanks for tagging me, but my involvement was just a drive-by comment hoping to help with release vs debug mode behavior, I know nothing about the issue itself :) Good luck on the fix!
Hi Martin, have you tested this with my playground example from #247? I can't right now.
Hey Navid, I just tested with your example, but with the usize
changed to u16
. It no longer crashes. The new wrap_optimal_fit
fuzz test covers similar ground and I cannot make it crash any longer.
@mgeisler Thanks for working on this issue! Here are my first thoughts. I will give this issue some more thought and maybe do some experiments in the next days.
Thanks! No stress, I'll leave it around for while, probably until after the holidays. It's a pretty major change so I also have to get used to it :-D
So a maximum line width in textwrap of
u16::MAX
would mean a maximum line width ofu16::MAX
/ 100 = 655.35 mm, i. e. landscape A2 would be the maximum paper size (for 12 pt text). This should work fine for almost all real-world applications, but it is not totally impossible that a user could reach this limit with a legitimate use case.
That is a very good point... since we have u128
available as well, I just tried changing the internals of wrap_optimal_fit
to use this wider type. With that, widths up to u32
seems to work fine! I've pushed a second commit on this branch which demonstrates this.
That would let users typeset text on a piece of A-31 paper :laughing:
Sorry for the confusion... I renamed the branch to better reflect the change. The new PR is up at #420.
Using an
usize
for the line width causes problems for the optimal-fit wrapping algorithm. The problem is that the algorithm works with integer values formed byWhen
line_width
is nearusize::max_value()
, this computation will overflow anusize
. By limiting the line width to anu16
, we can do the internal computations withu64
and avoid the overflows.