mgeisler / textwrap

An efficient and powerful Rust library for word wrapping text.
MIT License
446 stars 44 forks source link

`display_width` and hyperlinks #525

Closed tertsdiepraam closed 7 months ago

tertsdiepraam commented 10 months ago

Hi! I'm working on uutils and collaborating with the eza people. We have use case for the display_width function that requires hyperlinks to be ignored. This does not seem to be currently supported by textwrap. This is understandable since it's a fairly niche use case but we'd still very much like to see this supported.

As far as I can tell the issue boils down to the fact that textwrap assumes that the ANSI sequences start with ESC [, but hyperlinks are part of Operating System Command block that start with ESC ] (which is info I'm also just pulling from wikipedia, can't say I'm an expert at ANSI codes: https://en.wikipedia.org/wiki/ANSI_escape_code#OSC). After ESC ], it looks like there are more characters allowed before the sequence ends and it must therefore end with ESC \.

If this is something you're interested in to support, I'd be happy to try to make a PR.

cc @pthorpe92

mgeisler commented 10 months ago

Hi! Yes, this would be a great addition and I would happy to support it.

I'm also not an expert on this, but I think you read the standard correctly. I found https://github.com/Alhadis/OSC8-Adoption which has lots of good links to other projects supporting this.

Please put up a PR and I'll take a look!

tertsdiepraam commented 10 months ago

Great! I've also been putting an implementation together in a separate repo: https://github.com/uutils/ansi-width/, which we intend to publish as a crate, because I figured it's a bit strange that nobody has published a separate crate for this yet. Would you accept using that library in textwrap?[^1] We are going to use that in some of our other libraries, so it will be maintained. If you don't want that, I completely understand and I'd still be happy to just implement a PR for textwrap.

[^1]: I don't have the choice between simple width and unicode width in there yet, but that could be easily added.

mgeisler commented 10 months ago

I've gotten pushback in the past for any dependency on Textwrap — this is actually why all the dependencies are optional now since people seem to care more about compilation times than they do about features. Despite that, Clap decided to vendor Textwrap inside their source tree: clap_builder/src/output/textwrap. That crate was a big reverse dependency in the past.

So I'm in doubt if I should take on a dependency for this. People have suggested this in the past, but with larger crates which implements support for parsing more or all of the ANSI escape codes.

If this is a small change, then it would be easiest to just extend the existing function. However, I would personally like to remove the code and swap the unicode-width dependency with a new dependency. I trust compilers to remove unused code and to do it efficiently, so I love exporting complexity out of the code I maintain :slightly_smiling_face:

One thing which I've considered here is that the unconditional skipping of ANSI escape characters might be a mistake. Nobody have complained so far, but it seems a bit odd to hardcode this terminal logic into what ought to be a pure string-manipulating library. So maybe it should be moved to a different layer of the stack. I don't know if it would affect the tradeoffs here.

tertsdiepraam commented 10 months ago

Of course! I've also found that people tend to count dependencies and draw a lot of conclusions just from that number, without much consideration of the tradeoffs involved. 😄 I'll try to make a PR implementing it here directly sometime next week!

Re: not stripping ANSI. This crate feels (to me) very closely related to the terminal, specifically because it considers monospace fonts (with some characters spanning multiple columns). Outside the terminal there aren't that many places where that exact functionality is required. But making it optional is even better still of course.