mgeisler / textwrap

An efficient and powerful Rust library for word wrapping text.
MIT License
467 stars 45 forks source link

Use `u16` instead of `usize` for line width #418

Closed mgeisler closed 2 years ago

mgeisler commented 2 years ago

Using an usize for the line width causes problems for the optimal-fit wrapping algorithm. The problem is that the algorithm works with integer values formed by

(line_width - text_width)**2

When line_width is near usize::max_value(), this computation will overflow an usize. By limiting the line width to an u16, we can do the internal computations with u64 and avoid the overflows.

mgeisler commented 2 years ago

Hi @robinkrahl, @NeverGivinUp, @adetaylor, and @olson-sean-k!

You've all been active in #247 and #416 and I would like your feedback on this PR. I believe both issues are caused by the same underlying problem: unintended integer overflows in wrap_optimal_fit. This seems to only be a problem when you wrap at very large widths (of the same magnitude as usize::max_value()) or when the fragments you wrap are very large.

In this PR, I'm playing with the idea of limiting the maximum line width to something much smaller than usize: namely u16. For terminals, this seems great: the terminal_size crate already returns the current terminal width as an u16 and I believe this is what the underlying API returns. However, using a small type like this might make other applications impossible? Can we make the type generic and use u16 as a default while letting users plug in their own type if they want?

So now my question to you is: does it sound like a good idea to limit the maximum line width like this? The size of each fragment (word) is similarly limited to u16 in this PR.

I've added tests to ensure that we can still wrap a string with length larger than usize::max(), provided that each fragment width fits in an u16. I've also added new fuzz tests which exercise both the first-fit and the optimal-fit wrapping algorithms using randomly generated fragments. The wrap_optimal_fit fuzz test would immediately find an overflow before, now it runs for millions of iterations without finding any crashes.

NeverGivinUp commented 2 years ago

Hi Martin, have you tested this with my playground example from #247? I can't right now.

robinkrahl commented 2 years ago

@mgeisler Thanks for working on this issue! Here are my first thoughts. I will give this issue some more thought and maybe do some experiments in the next days.

For the genpdf use case, I am working with decimal widths (in mm), so I would in general prefer using f32 or f64 widths than having to map the widths to integers. That being sad, from my experience I need at least two decimals for precise calculations without visible errors with typical font sizes of around 12 pt. So a maximum line width in textwrap of u16::MAX would mean a maximum line width of u16::MAX / 100 = 655.35 mm, i. e. landscape A2 would be the maximum paper size (for 12 pt text). This should work fine for almost all real-world applications, but it is not totally impossible that a user could reach this limit with a legitimate use case.

In any case, this change would mean that is no longer possible to use a fixed scale factor of e. g. 100. Therefore it would be helpful to have an interface that automatically scales all widths so that the line width maps to u16::MAX.

adetaylor commented 2 years ago

Thanks for tagging me, but my involvement was just a drive-by comment hoping to help with release vs debug mode behavior, I know nothing about the issue itself :) Good luck on the fix!

mgeisler commented 2 years ago

Hi Martin, have you tested this with my playground example from #247? I can't right now.

Hey Navid, I just tested with your example, but with the usize changed to u16. It no longer crashes. The new wrap_optimal_fit fuzz test covers similar ground and I cannot make it crash any longer.

@mgeisler Thanks for working on this issue! Here are my first thoughts. I will give this issue some more thought and maybe do some experiments in the next days.

Thanks! No stress, I'll leave it around for while, probably until after the holidays. It's a pretty major change so I also have to get used to it :-D

mgeisler commented 2 years ago

So a maximum line width in textwrap of u16::MAX would mean a maximum line width of u16::MAX / 100 = 655.35 mm, i. e. landscape A2 would be the maximum paper size (for 12 pt text). This should work fine for almost all real-world applications, but it is not totally impossible that a user could reach this limit with a legitimate use case.

That is a very good point... since we have u128 available as well, I just tried changing the internals of wrap_optimal_fit to use this wider type. With that, widths up to u32 seems to work fine! I've pushed a second commit on this branch which demonstrates this.

That would let users typeset text on a piece of A-31 paper :laughing:

mgeisler commented 2 years ago

Sorry for the confusion... I renamed the branch to better reflect the change. The new PR is up at #420.