mgeisler / textwrap

An efficient and powerful Rust library for word wrapping text.
MIT License
449 stars 44 forks source link

Add configurable LineEnding #454

Closed koiuo closed 2 years ago

koiuo commented 2 years ago

This is a quick demonstration of how configurable line ending support would look like #453 .

Immediately I see an issue with unfill which now requires LineEnding parameter, degrading experience for the clients who do not wish to deal with configurable line endings.

Perhaps, we could have two fns: unfill_with_endings(&str, LineEnding) and unfill(&str) { unfill_with_endings(str, LineEnding::LF) }

Alternatives to consider

My gut feeling is that explicit control over the behavior is better (unicode defines multiple other options over the well-known CR and LF), but I do not think this is a strong argument considering lines() only recognizes \n and \r\n.

If the overall approach (or any of the alternatives) seems reasonable, I'll work towards finalizing this by providing tests, documentation, etc.. Let me know please.

koiuo commented 2 years ago

@mgeisler , please let me know what do you think about the overall approach. (and just in case, I'm totally ok if after looking at the code you decide that this approach does not fit) Thanks for your time.

mgeisler commented 2 years ago

Thanks so much for the PR, it looks really good! I've left a few comments, but overall I think this should work just fine.

mgeisler commented 2 years ago

Perhaps, we could have two fns: unfill_with_endings(&str, LineEnding) and unfill(&str) { unfill_with_endings(str, LineEnding::LF) }

Alternatives to consider

  • using lines() in unfill (trimming trailing \r\n could be implemented as discarding trailing empty lines)
  • autodetecting the end of the line by examining the input and matching against a set of known patterns

Yeah, thanks for listing this out. My preference would be for auto-detecting the input line ending in unfill: the job of the function is to make sense of a piece of text without anyone telling it how wide the lines are, what the prefix is etc. So detecting the line ending used would consistent with that.

Of course, it could happen that there are no newlines in the input text: if so, then I think it's acceptable that the caller gets an Options back with LineEnding::LF. Basically, such an input string is ambiguous and Textwrap can default to LineEnding::LF.

mgeisler commented 2 years ago

Hey @edio, in https://github.com/mgeisler/textwrap/issues/453#issuecomment-1146883209 you mentioned that we could have a LineEnding::Auto variant. I guess that would work like this: if "\r\n" in text { LineEnding::CRLF } else { LineEnding::LF }? That is, it detects if the input has \r\n or \n line endings already and use this to split/join the lines.

That could perhaps be nice! Normally, I would suggest that the callers should be explicit about this: I would expect editors and whatnot to know the encoding of the text they try to wrap. If the encoding is unknown, callers could always detect it using the code above.

Because of this, I think we could defer adding this variant and just do the simple two-variant enum first.

koiuo commented 2 years ago

@mgeisler , got you, thanks!

Not arguing, just want to explain my thinking: Rust's lines() is smart and efficient about the CRLF support. It always splits by \n and immediately after the split it checks whether the last byte of the line is \r, and removes it, if that's the case. However \r\n in text would require full traversal over the input (and, btw, without a guarantee that it'll actually find it, so it is indeed a full scan), and then another full traversal to actually do splits.

(also, ACK to all your comments, will get to the fixes ASAP, but next weekend may be busy for me, so I may get to it only next week. Sorry about the delay :pray: )

mgeisler commented 2 years ago

Not arguing, just want to explain my thinking:

Thanks for explaining!

Rust's lines() is smart and efficient about the CRLF support. It always splits by \n and immediately after the split it checks whether the last byte of the line is \r, and removes it, if that's the case. However \r\n in text would require full traversal over the input (and, btw, without a guarantee that it'll actually find it, so it is indeed a full scan), and then another full traversal to actually do splits.

Right, that's very clever! I see you implemented equal cleverness now in a very elegant iterator, which is awesome!

(also, ACK to all your comments, will get to the fixes ASAP, but next weekend may be busy for me, so I may get to it only next week. Sorry about the delay :pray: )

This is open source, there are no deadlines! We all work on this when we have time and energy :smiley: Thank you for the great PR!

mgeisler commented 2 years ago

Hey @edio, thanks for all the great updates on the PR.

The build error on Rust 1.56 looks the same as what I had in another project: https://github.com/mgeisler/lipsum/pull/84. I think you just need to be explicit with the type of the slice (for some reason which I don't understand).

koiuo commented 2 years ago

@mgeisler , thank you for the fix for the Rust 1.56 compatibility issue. It'd took me ages to figure that out.

I believe I adressed all your requests for changes, but please do not hesitate to point to other problems if you can think of any.

And thanks again for your time and help on this PR.

mgeisler commented 2 years ago

thank you for the fix for the Rust 1.56 compatibility issue. It'd took me ages to figure that out.

Yeah, that was a weird case for me too :smile:

mgeisler commented 2 years ago

Thanks for all the hard work on this!