nodejs / node

Node.js JavaScript runtime ✨🐢🚀✨
https://nodejs.org
Other
106.58k stars 29.06k forks source link

util.stripVTControlCharacters does not strip hyperlinks #53697

Open isaacs opened 2 months ago

isaacs commented 2 months ago

Version

22.4.0

Platform

Darwin moxy.lan 23.5.0 Darwin Kernel Version 23.5.0: Wed May  1 20:17:33 PDT 2024; root:xnu-10063.121.3~5/RELEASE_ARM64_T6031 arm64

Subsystem

util

What steps will reproduce the bug?

console.log(util.stripVTControlCharacters('\x1b]8;;http://example.com\x1b\\This is a link\x1b]8;;\x1b\\ hello'))

How often does it reproduce? Is there a required condition?

Always

What is the expected behavior? Why is that the expected behavior?

Should output:

This is a link hello

(With "This is a link" not hyperlinked.)

What do you see instead?

ttp://example.comThis is a link;; hello

Additional information

OCS codes are a bit tricky to capture and strip, but this is what I'm using in ansi-to-pre:

/\u001B\]8;;(.*?)(?:\u001B\\|\u0007)(.*?)\u001B]8;;(?\u001B\\|\u0007)/

Adding str.replaceAll(/\u001b\]8;;(.*?)(?:\u001b\\|\u0007)(.*?)\u001b]8;;(?:\u001b\\|\u0007)/g, '$2') should do the right thing, but not sure if there's some better way to do it in context.

RedYetiDev commented 2 months ago

+ repro-exists:

└─$ node
Welcome to Node.js v22.3.0.
Type ".help" for more information.
> util.stripVTControlCharacters('\x1b]8;;http://example.com\x1b\\This is a link\x1b]8;;\x1b\\ hello')
ttp://example.comThis is a link;; hello
RedYetiDev commented 2 months ago

The current RegEx used is:

/[\u001B\u009B][[\]()#;?]*(?:(?:(?:(?:;[-a-zA-Z\d\/#&.:=?%@~_]+)*|[a-zA-Z\d]+(?:;[-a-zA-Z\d\/#&.:=?%@~_]*)*)?\u0007)|(?:(?:\d{1,4}(?:;\d{0,4})*)?[\dA-PR-TZcf-ntqry=><~]))/g
RedYetiDev commented 2 months ago

I'm currently building and testing @isaacs suggested regex.

isaacs commented 2 months ago

Oh, hey, looks like you can put other parameters between the ; in OCS 8 codes, so this regexp would probably be a bit more complete:

/\u001b\]8;[^;]*?;(.*?)(?:\u001b\\|\u0007)(.*?)\u001b]8;[^;]*?;(?:\u001b\\|\u0007)/g

Note the addition of [^;]*? between the ;; chars.

(I don't know of any terminal emulators or programs that do put parameters there, as it's just "reserved for future use", but that's what the spec says it can be.)

isaacs commented 2 months ago

I was wrong, the id parameter is used in some cases to control link hover behavior, so yes, would need to strip that.

RedYetiDev commented 2 months ago

I'm seeing a number of failures with your RegEx, but I do agree the current one may have a bug. (I would CC a util team but there isn't one :/)

sindresorhus commented 2 months ago

The regex at https://github.com/chalk/ansi-regex/blob/main/index.js supports links. You could maybe find some inspiration there.

RedYetiDev commented 2 months ago

Hi @sindresorhus,

IIRC, the regex used is from ansi-regex, but an older version.

Later today, I'll test if a newer version of the regex fixes the issue.

RedYetiDev commented 2 months ago

@sindresorhus

// From https://github.com/chalk/strip-ansi/blob/main/index.js
function ansiRegex({onlyFirst = false} = {}) {
    const pattern = [
        '[\\u001B\\u009B][[\\]()#;?]*(?:(?:(?:(?:;[-a-zA-Z\\d\\/#&.:=?%@~_]+)*|[a-zA-Z\\d]+(?:;[-a-zA-Z\\d\\/#&.:=?%@~_]*)*)?\\u0007)',
        '(?:(?:\\d{1,4}(?:;\\d{0,4})*)?[\\dA-PR-TZcf-nq-uy=><~]))'
    ].join('|');

    return new RegExp(pattern, onlyFirst ? undefined : 'g');
}

console.log('\x1b]8;;http://example.com\x1b\\This is a link\x1b]8;;\x1b\\ hello'.replace(ansiRegex(), ''));

Outputs:

"ttp://example.com\u001b\\This is a link;;\u001b\\ hello"

https://runkit.com/6689b91eebd0a700080cd8b3/668c3336e85de300088f3f39

RedYetiDev commented 2 months ago

I've opened an issue in https://github.com/chalk/ansi-regex/issues/56, which is the source for the RegEx used by Node.js. Once resolved, I'll update the RegEx.