helix-editor / helix

A post-modern modal text editor.
https://helix-editor.com
Mozilla Public License 2.0
32.94k stars 2.43k forks source link

`:reflow` does not recognize `\n\n` as paragraph end/start #2419

Open getreu opened 2 years ago

getreu commented 2 years ago

In markup languages \n\n marks the end and start of a new paragraph. The current word wrap implementation does not recognise paragraph endings. Instead they are interpreted as usual white-space and the text is formatted as one big block.

getreu commented 2 years ago

Workaround

(Maybe this is the intended way to format text in Helix?)

  1. Select the whole text to format.
  2. Split in paragraphs with: Normal-Mode S (capital S), type \n\n, then [Enter]
  3. Format text: :reflow, then [Enter]
getreu commented 2 years ago

The above workaround does not work e.g. with block quotes >.

A more general approach would be to refer to Treesitter to analyse the text's structure.

vlmutolo commented 2 years ago

This is a known shortcoming of the current approach. I'm not sure how to get textwrap, the underlying crate powering the feature, to recognize "blank" lines.

We could try to manually do it and call textwrap on the individual paragraphs only, but then we'd have to re-implement the prefix detection to handle things like the following scenario:

/// # title
///
/// some paragraph text

@mgeisler Any ideas? Is this something textwrap already handles and I just didn't find it?

getreu commented 2 years ago

It seems to me that, WrapAlgorithm in textwrap::wrap_algorithms - Rust has a notion of "paragraph".

This wrapping algorithm considers the entire paragraph to find optimal line breaks. When wrapping text, “penalties” are assigned to line breaks based on the gaps left at the end of lines. See [Penalties](https://docs.rs/textwrap/latest/textwrap/wrap_algorithms/struct.Penalties.html) for details.

mgeisler commented 2 years ago

Hi @vlmutolo and @getreu, you're correct that refill doesn't recognize paragraphs currently. It will consider all lines as belonging to the same paragraph, regardless of blank lines and so on.

It seems to me that, WrapAlgorithm in textwrap::wrap_algorithms - Rust has a notion of "paragraph".

The notion of a paragraph is quite simple (primitive): functions like textwrap::wrap will split the input on \n and wrap each line as it's own paragraph.

Basically, Textwrap would originally disregard all whitespace and put all words into a single wrapped paragraph. However, people expect newlines to be preserved, so Textwrap now splits on \n, wraps the lines, and then joins everything together with \n again.

That is true for textwrap::wrap and fill, but I neglected to implement this for refill. We should fix this, so I would appreciate it if one of you could open an issue for it in the Textwrap repository.

getreu commented 2 years ago

What about defining a custom WrapAlgorithm in textwrap::wrap_algorithms - Rust that honours paragraph boundaries?

getreu commented 2 years ago

We probably need a custom wrap algorithm (WrapAlgorithm) for other unsoundness anyway: e.g. we do not want wrap in the middle of URLs and eventually not even after - in the middle of the word.

mgeisler commented 2 years ago

What about defining a custom WrapAlgorithm in textwrap::wrap_algorithms - Rust that honours paragraph boundaries?

The job of the wrap algorithm is to turn a slice of words into wrapped lines. This is done via wrap.

Under the hood, wrap will call wrap_first_fit (the greedy algorithm) or wrap_optimal_fit. These functions operate on "fragments". A fragment is a block of something which has a width plus some whitespace.

We probably need a custom wrap algorithm (WrapAlgorithm) for other unsoundness anyway: e.g. we do not want wrap in the middle of URLs and eventually not even after - in the middle of the word.

It's the job of other functions to prepare these fragments — and these other functions decide if - should split words or not. Concretely, the WordSplitter enum implements a few different ways to split a word into, well, smaller words. The code there works on words represented by an actual &str.

getreu commented 2 years ago

Another workaround

  1. Select text
  2. Type | (pipe)
  3. fmt, [Enter]
getreu commented 2 years ago

What about defining a custom WrapAlgorithm in textwrap::wrap_algorithms - Rust that honours paragraph boundaries?

The job of the wrap algorithm is to turn a slice of words into wrapped lines. This is done via wrap.

Under the hood, wrap will call wrap_first_fit (the greedy algorithm) or wrap_optimal_fit. These functions operate on "fragments". A fragment is a block of something which has a width plus some whitespace.

We probably need a custom wrap algorithm (WrapAlgorithm) for other unsoundness anyway: e.g. we do not want wrap in the middle of URLs and eventually not even after - in the middle of the word.

It's the job of other functions to prepare these fragments — and these other functions decide if - should split words or not. Concretely, the WordSplitter enum implements a few different ways to split a word into, well, smaller words. The code there works on words represented by an actual &str.

Well, your explanation confirms, that textwrap - Rust is highly customisable?

mgeisler commented 2 years ago

Well, your explanation confirms, that textwrap - Rust is highly customisable?

Yes, I would say so :-) The most elaborate example of this is the WebAssembly demo, where I could put in toggles for all the options: https://mgeisler.github.io/textwrap. It also shows how Textwrap can wrap non-console text (it uses f64 internally now for it's width computations).

I wrote up a blurb of text about it in #136 in a reply to @cessen. I'm sorry for ending up with a discussion split across issues like this.