Closed koiuo closed 2 years ago
@mgeisler , please let me know what do you think about the overall approach. (and just in case, I'm totally ok if after looking at the code you decide that this approach does not fit) Thanks for your time.
Thanks so much for the PR, it looks really good! I've left a few comments, but overall I think this should work just fine.
Perhaps, we could have two fns:
unfill_with_endings(&str, LineEnding)
andunfill(&str) { unfill_with_endings(str, LineEnding::LF) }
Alternatives to consider
- using
lines()
inunfill
(trimming trailing\r\n
could be implemented as discarding trailing empty lines)- autodetecting the end of the line by examining the input and matching against a set of known patterns
Yeah, thanks for listing this out. My preference would be for auto-detecting the input line ending in unfill
: the job of the function is to make sense of a piece of text without anyone telling it how wide the lines are, what the prefix is etc. So detecting the line ending used would consistent with that.
Of course, it could happen that there are no newlines in the input text: if so, then I think it's acceptable that the caller gets an Options
back with LineEnding::LF
. Basically, such an input string is ambiguous and Textwrap can default to LineEnding::LF
.
Hey @edio, in https://github.com/mgeisler/textwrap/issues/453#issuecomment-1146883209 you mentioned that we could have a LineEnding::Auto
variant. I guess that would work like this: if "\r\n" in text { LineEnding::CRLF } else { LineEnding::LF }
? That is, it detects if the input has \r\n
or \n
line endings already and use this to split/join the lines.
That could perhaps be nice! Normally, I would suggest that the callers should be explicit about this: I would expect editors and whatnot to know the encoding of the text they try to wrap. If the encoding is unknown, callers could always detect it using the code above.
Because of this, I think we could defer adding this variant and just do the simple two-variant enum first.
@mgeisler , got you, thanks!
Not arguing, just want to explain my thinking: Rust's lines()
is smart and efficient about the CRLF support.
It always splits by \n
and immediately after the split it checks whether the last byte of the line is \r
, and removes it, if that's the case. However \r\n in text
would require full traversal over the input (and, btw, without a guarantee that it'll actually find it, so it is indeed a full scan), and then another full traversal to actually do splits.
(also, ACK to all your comments, will get to the fixes ASAP, but next weekend may be busy for me, so I may get to it only next week. Sorry about the delay :pray: )
Not arguing, just want to explain my thinking:
Thanks for explaining!
Rust's
lines()
is smart and efficient about the CRLF support. It always splits by\n
and immediately after the split it checks whether the last byte of the line is\r
, and removes it, if that's the case. However\r\n in text
would require full traversal over the input (and, btw, without a guarantee that it'll actually find it, so it is indeed a full scan), and then another full traversal to actually do splits.
Right, that's very clever! I see you implemented equal cleverness now in a very elegant iterator, which is awesome!
(also, ACK to all your comments, will get to the fixes ASAP, but next weekend may be busy for me, so I may get to it only next week. Sorry about the delay :pray: )
This is open source, there are no deadlines! We all work on this when we have time and energy :smiley: Thank you for the great PR!
Hey @edio, thanks for all the great updates on the PR.
The build error on Rust 1.56 looks the same as what I had in another project: https://github.com/mgeisler/lipsum/pull/84. I think you just need to be explicit with the type of the slice (for some reason which I don't understand).
@mgeisler , thank you for the fix for the Rust 1.56 compatibility issue. It'd took me ages to figure that out.
I believe I adressed all your requests for changes, but please do not hesitate to point to other problems if you can think of any.
And thanks again for your time and help on this PR.
thank you for the fix for the Rust 1.56 compatibility issue. It'd took me ages to figure that out.
Yeah, that was a weird case for me too :smile:
Thanks for all the hard work on this!
This is a quick demonstration of how configurable line ending support would look like #453 .
Immediately I see an issue with
unfill
which now requiresLineEnding
parameter, degrading experience for the clients who do not wish to deal with configurable line endings.Perhaps, we could have two fns:
unfill_with_endings(&str, LineEnding)
andunfill(&str) { unfill_with_endings(str, LineEnding::LF) }
Alternatives to consider
lines()
inunfill
(trimming trailing\r\n
could be implemented as discarding trailing empty lines)My gut feeling is that explicit control over the behavior is better (unicode defines multiple other options over the well-known
CR
andLF
), but I do not think this is a strong argument consideringlines()
only recognizes\n
and\r\n
.If the overall approach (or any of the alternatives) seems reasonable, I'll work towards finalizing this by providing tests, documentation, etc.. Let me know please.