Closed koiuo closed 2 years ago
Hey @edio, thanks for opening the issue. Super exciting that textwrap is used in the editor!
You're not missing anything, textwrap is hard-coded to split on \n
in textwrap::wrap
. I see you're using textwrap::refill
and that is also hard-coded to use \n
.
I've been thinking of \n
as the logical way to separate lines and \r\n
as more of a fileformat encoding of the logical \n
characters. That is, my opinion is that programs should use \n
throughout their processing and then encode the string to the target file format (e.g., do a .replace("\n", "\r\n")
call at the point where the data leaves the program).
Searching a bit, it seems that this way of thinking comes from how languages like C++ handle text mode files. Rust doesn't have a text mode, so I think my way of thinking is outdated :-D
Now, I see a few different approaches for handling this:
\n
delimited string with Textwrap. That matches my way of thinking above. Easiest solution to implement for Textwrap :smile: .lines()
internally and ask people to use textwrap::wrap
(which return a Vec<&str>
) instead of textwrap::fill
if they want to use custom line endings.Out of the options, I think the first two are the best. Since Textwrap tries to be useful for many use-cases, I will be happy to see a custom enum like you suggest. I agree that two enum variants ought to be enough for now. Since we split the input just once, I think the performance impact should be negligible, but please compare
% cargo criterion fill_optimal_fit_ascii/4800
before and after to make sure.
Another thing: I think Helix is the first place which uses textwrap::refill
for real. So if you run into problems with it, then please let me know. Most of the logic is in textwrap::unfill
and it could use some real-world feedback since it's quite fuzzy what it should do on different input.
@mgeisler , thanks for the response. I do not represent the helix dev team. I just had a couple of free evenings and wanted to find an interesting task to work on.
I considered transforming the input before feeding it to textwrap, but that seemed inefficient (and most of all cumbersome).
.lines()
+ ~textwrap::fill
~textwrap:wrap
seems like a nice and clean workaround in terms of code. ~That still leaves us with the default \n
in the output, however (so it is still cumbersome, even if reduced by half 😁)~
Perhaps I should summon @kirawi here, who's been handling the issue in the helix repo and, I assume, familiar with the helix codebase (unlike me). @kirawi, please read Martin's comment https://github.com/mgeisler/textwrap/issues/453#issuecomment-1146168714 and share your opinion about the proposed solutions. Thank you!
EDIT: I misread. The proposal was to use lines()
+ textwrap::wrap
. Works better for the client, but introduces a slight behavoir inconsistency for this lib: it handles any line ending in the input, but only outputs \n
. Not sure if it's a problem.
Works better for the client, but introduces a slight behavoir inconsistency for this lib: it handles any line ending in the input, but only outputs
\n
. Not sure if it's a problem.
My thinking was that the client (the caller of wrap
) would use \n
or \r\n
as desired to put the lines back together. Perhaps the caller would use neither: if the lines are meant to be displayed in an editor (:wink:), then perhaps the lines are just sent one by one to the display engine.
Thanks for the pull request, I'll go take a look now!
@mgeisler , I benchmarked wrap
with lines()
vs split('\n')
on 4800 input, and consistently across multiple runs the lines()
version is slower. lines()
additionally removes \r
in the end of the line after the split, so it being slower is expected.
My laptop isn't tuned for benchmarking (variable CPU frequency, etc.), but the best case of split('\n')
is 5% faster than the best case of lines()
.
IMO, if there's no explicit control over the line ending, lines()
should be the default behavior. I'm not sure 5% performance gain is worth the reduced portability (esp. considering Rust targets Windows too), and preprocessing of input will kill the 5% gain if input has CRLF.
Anyway switching to lines()
conflicts with the line_ending
option. So either we do lines()
and call it a day (this will solve the issue for helix), or we introduce the LineEnding
.
If we do lines()
now, we can still introduce the LineEnding
later (in this sense it is a safe option). We'll just want to have a special LineEnding::Auto
value. This would be the default, and a specific value like CRLF
would give a slight performance boost.
@mgeisler , I benchmarked
wrap
withlines()
vssplit('\n')
on 4800 input, and consistently across multiple runs thelines()
version is slower.lines()
additionally removes\r
in the end of the line after the split, so it being slower is expected. My laptop isn't tuned for benchmarking (variable CPU frequency, etc.), but the best case ofsplit('\n')
is 5% faster than the best case oflines()
.IMO, if there's no explicit control over the line ending,
lines()
should be the default behavior. I'm not sure 5% performance gain is worth the reduced portability (esp. considering Rust targets Windows too), and preprocessing of input will kill the 5% gain if input has CRLF.Anyway switching to
lines()
conflicts with theline_ending
option. So either we dolines()
and call it a day (this will solve the issue for helix), or we introduce theLineEnding
.If we do
lines()
now, we can still introduce theLineEnding
later (in this sense it is a safe option). We'll just want to have a specialLineEnding::Auto
value. This would be the default, and a specific value likeCRLF
would give a slight performance boost.
I like what you've done in https://github.com/mgeisler/textwrap/pull/454: let the caller of wrap
and friends specify what line ending to use both for splitting the lines and, in the case of fill
, to use for joining the generated lines.
Let us continue discussing there!
Fixed by #454, thanks!
This was a semver breaking change since textwrap::Options
is an exhaustive struct with pub
fields. Code like:
fn textwrap_options() -> textwrap::Options<'static> {
textwrap::Options {
width: textwrap::termwidth(),
initial_indent: " ",
subsequent_indent: " ",
break_words: true,
wrap_algorithm: textwrap::WrapAlgorithm::OptimalFit(Default::default()),
word_separator: textwrap::WordSeparator::UnicodeBreakProperties,
word_splitter: textwrap::WordSplitter::NoHyphenation,
}
}
breaks when updating from v0.15.0 to v0.15.1
Hi @Arnavion, oh no, I didn't think of that... Thanks for noticing! I'll yank the release and redo the release on the weekend.
I'm curious through, do you have code like that? Why would you write it like that when Options
come with a built-in builder?
I'm curious through, do you have code like that?
Yes. It failed to compile yesterday and that's why I started investigating and reached here.
Why would you write it like that when
Options
come with a built-in builder?
I wanted to initialize all the fields with values of my choice, so the builder would be more verbose than setting the fields directly. Also, having a struct with pub
fields is an invitation to let users set those fields manually, ie it's an explicit signal that it's okay to set those fields.
an invitation to let users set those fields manually, ie it's an explicit signal that it's okay to set those fields.
Yeah, I get that — I made them public to make it easy to reassign a few fields when needed. I just didn't expect anybody to actually use the struct literal syntax itself.
Thanks for letting me know, I'll fix it as soon as I can.
Yeah, in that case you can make it #[non_exhaustive]
to prevent this from happening in the future.
Thanks @Arnavion, I've yanked 0.15.1 and released 0.16.0 instead.
Please unyank 0.15.1, clap 3.2.22 explicitly depends on it.
0.15.0 should have been rereleased as 0.15.2 to make sure downstream users explicitly depending on 0.15.1 don't get stuck.
If any crate depends on ^0.15.1 explicitly, it would normally be doing so because it depends on API newly added in 0.15.1, so it wouldn't work with 0.15.0 released as 0.15.2. So doing what you suggest would be pointless in general.
That said, clap 3 seems to have updated to 0.15.1 without depending on the change in 0.15.1 ( https://github.com/clap-rs/clap/commit/2fd55076e65b14fa92b34161c44d5058a9966f69 ), so it would indeed work with 0.15.0 re-released as 0.15.2. Why it updated its dep for no reason is a mystery...
If any crate depends on ^0.15.1 explicitly, it would normally be doing so because it depends on API newly added in 0.15.1, so it wouldn't work with 0.15.0 released as 0.15.2. So doing what you suggest would be pointless in general.
This is wrong. If I add a dependency to my system I always take the biggest version number, because then that enforces I don't rely on a bugfix without doing so explicitly, and it also circumvents issues with crates that don't believe in minor version bumps (e.g. serde). Anyway it's good practice to always rerelease a bigger version number whenever a version is yanked to restore whatever was before.
That's fine, but I hope you defensively bump every dep you have in every commit you make, no matter how unrelated it is. After all you might have added foo dep at its latest version last week, but your clean build today might be relying on a new bugfix it released yesterday. Otherwise you're not achieving the safety you desire :)
(The only way this can be solved for sure is with -Z minimal-versions
.)
Anyway, this is not the place to argue about it, and I agree that there's no harm in releasing 0.15.2.
Hi @Arnavion and @nox Thanks for letting me know about this! I did not anticipate such a deadlock when I yanked the release... Let's discuss further fixes in #484.
Helix editor recently added a feature to reflow a piece of text; helix uses textwrap for this
There's an issue, where reflow does not work correctly if
CRLF
sequence (\r\n
) is used as a line terminator https://github.com/helix-editor/helix/issues/2645helix does not support
CR
ending (pre macosx), but I guess, the issue would manifest with it as well.As far as I can see, textwrap does not account for different line ending, there's no configuration option with which we could tune textwrap behavior.
Did I miss anything? Is there a way to support
CRLF
in textwrap?If not, I'd be willing to contribute a fix. My initial idea is to introduce another option into the
Options
struct, namedline_ending
(orline_terminator
as a fancier alternative). To keep things tight, I'd express it asenum
(we can always create something likeCustom
should we want to support an arbitrary byte sequence later, but I highly doubt that). How does that sound?